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BSTP-ECG1 PROTEIN AND RELATED REAGENTS AND METHODS OF 

USE THEREOF 

GOVERNMENT SUPPORT 

5 

The U.S. Government has a paid-up license in this invention and the right in 
limited circumstances to require the patent owner to license others on reasonable 
terms as provided for by the terms of Grant No. NIH CA 77097 awarded by the 
National Cancer Institute. 

10 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application claims priority to provisional applications U.S.S.N 
60/220,967, filed July 26, 2000, and U.S.S.N. 60/251,669, filed December 6, 2000, 
which are incorporated herein by reference. 

15 

BACKGROUND OF THE INVENTION 

A major challenge of cancer treatment is to target specific therapies to distinct 
tumor types in order to maximize efficacy and minimize toxicity. A related challenge 

20 lies in the attempt to provide accurate diagnostic, prognostic, and predictive 

information. At present, breast tumors are described with the tumor-node-metastasis 
(TNM) system. This system, which uses the size of the tumor, the presence or 
absence of tumor in regional lymph nodes, and the presence or absence of distant 
metastases, to assign a stage to the tumor is described in the American Joint 

25 Committee on Cancer: AJCC Cancer Staging Manual. Philadelphia, Pa: Lippincott- 
Raven Publishers, 5th ed., 1997, pp 171-180, and further discussion is found in Harris, 
JR: "Staging of breast carcinoma** in Harris, J.R, Hellman, S., Henderson, LC, Kinne 
D.W. (eds.): Breast Diseases. Philadelphia, Lippincott, 1991. The assigned stage is 
used as a basis for selection of appropriate therapy and for prognostic purposes. In 

30 addition to the TNM parameters, morphologic appearance is used to further classify 
tumors and thereby aid in selection of appropriate therapy. However, this approach 
has serious limitations. Tumors with similar histopathologic appearance can exhibit 
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significant variability in terms of clinical course and response to therapy. For 
example, some tumors are rapidly progressive while others are not. Some tumors 
respond readily to hormonal therapy or chemotherapy while others are resistant 
Assays for cell surface markers, e.g., using immunohistochemistiy, have 
5 provided means for dividing certain tumor types into subclasses. For example, one 
factor considered in prognosis and in treatment decisions for breast cancer is the 
presence or absence of the estrogen receptor (ER) in tumor samples. ER-positive 
breast cancers typically respond much more readily to hormonal therapies such as 
tamoxifen, which acts as an anti-estrogen in breast tissue, than ER-negative tumors. 
* 10 Though useful, these analyses are subject to variability and provide only a veiy crude 
basis for tumor classification. Therefore, there exists a need for improved methods for 
classifying tumors. 

Mutation or dysregulation of any of a large number of genes contributes to the 
development and progression of cancer as discussed in Hanahan, D. and Weinberg, 

15 R., The Hallmarks of Cancer, Cell, 100, 57-70, 2000. Genes that play a role in cancer 
can be divided into a number of broad classes including oncogenes, tumor suppressor 
genes, and genes that regulate apoptosis. Oncogenes such as ras typically encode 
proteins whose activities promote cell growth and/or division, a function that is 
necessary for normal physiological processes such as development, tissue 

20 regeneration, and wound healing. However, inappropriate activity or expression of 
oncogenes can lead to the uncontrolled cell proliferation that is a feature of cancer. 
Tumor suppressor genes such as rb act as negative regulators of cell proliferation. 
Loss of their activity, e.g., due to mutations or decreased expression at the level of 
mRNA or protein, can lead to unrestrained cell division. A number of familial cancer 

25 syndromes and inherited susceptibility to cancer are believed to be caused by 

mutations in tumor suppressor genes. Apoptosis, or programmed cell death, plays 
important roles both in normal development and in surveillance to eliminate cells 
whose survival may be deleterious to the organism, e.g., cells that have acquired DNA 
damage. Many chemotherapeutic agents are believed to work by activating the 

30 endogenous apoptosis pathway in tumor cells. 

Although a substantial number of genes have been implicated as playing 
important roles in cancer, the factors responsible for the phenotypic diversity of 
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tumors remain largely unknown- In particular, understanding of the underlying 
differences in gene expression that may contribute to tumor phenotype is Jimited. 
Understanding the differences in gene expression between normal and cancerous 
tissue and between different tumors of the same tissue type is of significant, 

5 diagnostic, prognostic, and therapeutic utility. There is therefore a need for the 
identification of genes exhibiting differential expression in tumors. In particular, 
there is a need for the identification of additional genes and proteins that can be used 
to classify tumors, especially genes and proteins that can provide diagnostic, 
prognostic, and/or predictive information in cancer. There is also a need for 

10 antibodies and other reagents for the detection and measurement of such genes and 
proteins. 

Most of the commonly used chemotherapeutic agents act relatively 
nonselectively. Rather than specifically killing tumor cells, these agents target any 
dividing cell, resulting in a variety of adverse effects. In addition, current therapeutic 
15 strategies are of limited efficacy, and the mortality rate of breast cancer remains high. 
There is therefore a need for the identification of additional genes and proteins that 
can be used as targets for the treatment of cancer. There is also a need for antibodies 
and other reagents that can modulate, regulate, or interact with these genes and 
proteins to provide new method of treatment for cancer. 



20 



SUMMARY OF THE INVENTION 



The present invention relates to the identification of genes of particular import 
in diagnosis, prognostication and/or therapeutic intervention in breast cancer and other 

25 tumors based on their expression profile in human breast tumor samples, their 
expression in other tissue and normal tissue samples, and in cell lines as assessed 
using cDNA microarrays. In particular, the genes are identified based on their 
differential expression across tumor samples. 

The invention provides a substantially purified polypeptide and fragments 

30 thereof that are encoded by an RNA molecule that is differentially expressed in human 
breast tumor samples and cell lines. The polypeptide is referred to as BSTP-ECG1 . 
Thus in one aspect the invention provides a substantially purified polypeptide whose 
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amino acid sequence comprises the amino acid sequence set forth in SEQ ID NO: 1 . 
The invention also provides polypeptides possessing homology to the polypeptide 
having the sequence of SEQ ID NO:l or to fragments of this polypeptide, wherein the 
polypeptides are significantly similar to the polypeptide of SEQ ID NO:l . In certain 
5 embodiments of the invention the definition of "significantly similar" can vary, as 
described further below. In certain embodiments of the invention a significantly 
similar polypeptide has one or more amino acid substitutions, deletions, and/or 
additions with respect to the sequence of SEQ ID NO: 1. In certain embodiments of 
the invention the polypeptides are expression products of human genes. When 
10 referring to polypeptides or polynucleotides whose sequence comprises the sequence 
set forth in a SEQ ID NO:X, the set of polypeptides or polynucleotides includes those 
polypeptides or polynucleotides having the particular sequence set forth in the SEQ 
ID NO:X in addition to other polypeptides or polynucleotides including the sequence 
ofSEQIDNO:X. 

15 In another aspect, the invention provides a substantially isolated and purified 

polynucleotide encoding the polypeptide of SEQ ID NO:l. In particular, the invention 
provides a substantially isolated and purified polynucleotide whose sequence 
comprises the sequence of SEQ ID NO:2. The invention further provides a 
substantially isolated and purified polynucleotide whose sequence comprises the 

20 sequence of SEQ ID NO:3 and also provides substantially isolated and purified 

polynucleotides whose sequence comprises the sequence of SEQ ID NO:4 or SEQ ID 
NO:5. The invention also provides a polynucleotide encoding a polypeptide 
possessing significant similarity to the polypeptide of SEQ ID NO:l, where 
"significantly similar" is defined below. The invention further provides a 

25 polynucleotide having a sequence that is complementary to the sequence of SEQ ID 
NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5. In addition, the invention 
provides a polynucleotide having a sequence that is complementary to a polypeptide 
encoding a polypeptide possessing significant similarity to the polypeptide of SEQ ID 
NO:l. 

30 In another aspect, the invention provides an isolated and purified 

polynucleotide that hybridizes under stringent conditions to a polynucleotide encoding 
a polypeptide comprising or having the amino acid sequence set forth in SEQ DO 
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NO:l. In certain embodiments of the invention the polynucleotide encodes at least 
10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, 
at least 80%, at least 90%, at least 95%, or at least 99% of the polypeptide comprising 
the amino acid sequence set forth in SEQ ID NO: L In particular, the invention 
5 provides an isolated and purified polynucleotide that hybridizes under stringent 

conditions to a polynucleotide having the sequence set forth in SEQ ID NO:2, SEQ ID ■ 
NO:3, SEQ ID NO:4, or SEQ ID NO:5, or fragments of either of these sequences. 
The invention further provides an isolated an purified polynucleotide that hybridizes 
under stringent conditions to a polynucleotide encoding a polypeptide having 

1 0 significant similarity to a polypeptide comprising or having the amino acid sequence 
set forth in SEQ ID NO:l. The invention also provides polynucleotides that hybridize 
under moderately stringent conditions to the foregoing polynucleotides. 

In another aspect, the invention provides a substantially purified 
oligonucleotide that includes a region of nucleotide sequence that hybridizes to at 

1 5 least 8 consecutive nucleotides of sense or antisense sequence of a nucleotide 

sequence selected from the group consisting of SEQ ID N02, SEQ ID NO:3, SEQ ID 
NO:4, or SEQ ID NO:5. The invention also provides a substantially purified 
oligonucleotide that includes a region of nucleotide sequence that hybridizes to at 
least 8 consecutive nucleotides of sense or antisense sequence of a nucleotide 

20 sequence that encodes the polypeptide of SEQ ID NO: 1 . In certain embodiments of 
the invention the oligonucleotide is labeled, e.g., with a fluorescent moiety, enzyme, 
enzyme substrate, or radioisotope, in order to facilitate detection of the 
oligonucleotide. The oligonucleotide can be used, e.g., as a probe or a primer, to 
detect the level of a polynucleotide encoding BSTP-ecgl in cells and/or tissues, e.g., 

25 tissues isolated from a patient The oligonucleotide can also be used to detect 

mutations in the gene encoding BSTP-ECG1 and/or to detect amplification or altered 
expression of the gene encoding BSTP-ECG1, in cells and/or tissues. In certain 
embodiments of the invention the oligonucleotide is attached to an oligonucleotide 
microarray. In certain embodiments of the invention the oligonucleotide has a 

30 sequence capable of binding specifically with an RNA molecule that encodes a 
polypeptide comprising an amino acid sequence set forth in SEQ ID NO: 1 so as to 
prevent appropriate processing, transport, or translation of the RNA molecule. 
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In another aspect the invention provides vectors, e.g. plasmids, containing a 
polynucleotide encoding a polypeptide comprising the sequence of SEQ ID NO: 1 . 
The invention also provides vectors, e.g., plasmids, comprising a polynucleotide 
encoding a polypeptide possessing significant similarity to the polypeptide of SEQ ID 
5 NO:l. In certain embodiments of the invention the polynucleotide comprises the 
polynucleotide sequence of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID 
NO:5. In certain embodiments of the invention the vectors contain genetic control 
elements operably linked to the polynucleotide, wherein the genetic control elements 
direct transcription of the polynucleotide. In certain embodiments of the invention the 

1 0 vectors and genetic control elements are adapted for expression of the polynucleotide 
in a bacterial cell, a yeast cell, an insect cell, or a mammalian cell. The invention 
further provides host cells, e.g., bacterial, yeast, insect and mammalian cells 
containing an expression vector containing the polynucleotide encoding the 
polypeptide having the sequence of SEQ ID NO: 1 . The invention further provides 

15 host cells, e.g., bacterial, yeast, insect, and mammalian cells containing an expression 
vector comprising a polynucleotide encoding a polypeptide having significant 
similarity to the polypeptide of SEQ ID NO:l. 

In another aspect, the invention provides an antibody that specifically bind to a 
polypeptide whose sequence comprises the amino acid sequence of SEQ ID NO: 1. 

20 The invention further provides antibodies that specifically bind to a polypeptide 
having significant similarity to the polypeptide of SEQ ID NO: 1 . The invention 
further provides methods of detecting a polypeptide whose sequence comprises the 
amino acid sequence of SEQ ID NO:l . The invention further provides methods of 
detecting a polypeptide having significant similarity to the polypeptide of SEQ ID 

25 NO:l. The polypeptides can be detected in a variety of different contexts. For 

example, the polypeptides can be detected in lysates or extracts derived from cells or 
tissues, in culture medium, in substantially intact cells or tissue samples (e.g., biopsy 
specimens), and/or in the blood, urine, serum, ascites, or other body fluids or 
secretions of a subject 

30 In another aspect, the invention provides a nonhuman transgenic organism 

comprising a normative DNA molecule encoding a polypeptide comprising an amino 
acid sequence set forth in SEQ ID NO:l, or an ortholog of such a polypeptide, and 
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including genetic control elements sufficient to direct transcription of the DNA 
molecule in at least a subset of the organism's cells. The invention further provides a 
nonhtiman transgenic organism comprising a normative DNA molecule encoding a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO.i and 
5 including genetic control elements sufficient to direct transcription of the DNA 
molecule in at least a subset of the organism's cells. In yet another aspect, the 
invention provides a nonhuman organism in which a native DNA sequence encoding a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO:l is 
deleted, e.g., by homologous recombination. 

10 In another aspect, the invention provides methods for detecting a 

polynucleotide encoding a polypeptide comprising an amino acid sequence set forth in 
SEQ ED NO:l, or a polypeptide having significant similarity to the polypeptide of 
SEQ ID NO:l, in a biological sample. The sample may comprise cells, tissue, body 
fluid, etc., isolated from a subject The sample may be processed in a variety of ways 

1 5 prior to application of the methods. For example, the sample may be subjected to 
purification, reverse transcription, amplification, etc. One such method comprises 
steps of: (a) hybridizing a nucleic acid complementary to the polynucleotide encoding 
a polypeptide comprising an amino acid sequence set forth in SEQ ID NO: 1, or a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO: 1 , to at 

20 least one nucleic acid in the biological sample, thereby forming a hybridization 
complex; and (b) detecting the hybridization complex, wherein the presence of the 
hybridization complex indicates the presence of a polynucleotide encoding the 
polypeptide in the biological sample. A second such method comprises steps of: (a) 
hybridizing a nucleic acid encoding a polypeptide comprising an amino acid sequence 

25 set forth in SEQ ID NO:l , or a polypeptide having significant similarity to the 

polypeptide of SEQ ID NO: 1, to at least one nucleic acid complementary to at least 
one nucleic acid in the biological sample, thereby forming a hybridization complex; 
and (b) detecting the hybridization complex, wherein the presence of the hybridization 
complex indicates the presence of a polynucleotide encoding the polypeptide in the 

30 biological sample. 

In yet another aspect, the invention provides kits for detecting a polynucleotide 
encoding a polypeptide comprising an amino acid sequence set forth in SEQ ID NO:l, 
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or a polypeptide having significant similarity to the polypeptide of SEQ ID NO: 1 . 
The kits can comprise a polynucleotide that hybridizes specifically to the 
polynucleotide encoding a polypeptide comprising an amino acid sequence set forth in 
SEQ ID NO:l, or a polypeptide having significant similarity to the polypeptide of 
5 SEQ ID NO:l and, optionally, other materials such as suitable buffers, indicators 
(e.g., fluorophores, chromophores or enzymes providing same), controls (e.g., an 
appropriate polynucleotide of this invention), controls, and directions for using the kit 

In another aspect, the invention provides methods for detecting a polypeptide 
1 0 comprising an amino acid sequence set forth in SEQ ID NO: 1, or a polypeptide 

having significant similarity to the polypeptide of SEQ ID NO: 1 . One such method 
comprises steps of. (a) contacting the biological sample with an antibody that 
specifically binds to the polypeptide of SEQ ID NO: 1, or a polypeptide having 
significant similarity to the polypeptide of SEQ ID NO:l; and (b) determining 
15 whether the antibody specifically binds to the sample, the binding being an indication 
that the sample contains the polypeptide. The invention also provides kHjufor 
performing these methods. The kits can comprise an antibody (preferably a 
monoclonal antibody) that binds to the polypeptide and, optionally, other materials 
such as suitable buffers, indicators (e.g., fluorophores, chromophores or enzymes 
20 providing same), controls (e.g., a polypeptide of this invention) and directions for 
using the kit. 

In another aspect, the invention provides methods for producing a polypeptide 
comprising the amino acid sequence of SEQ ID NO:l, or a polypeptide having 
significant similarity to the polypeptide of SEQ ID NO:l. In one embodiment the 

25 method includes the steps of providing a cell that expresses the polypeptide or 
polypeptide fragment, e.g., a cell containing an expression vector containing a 
polynucleotide encoding the polypeptide or polypeptide fragment operably linked to 
genetic control elements that direct transcription of the polynucleotide; maintaining 
the cell under conditions wherein the polypeptide is produced; harvesting the 

30 polypeptide from the cell and/or the culture medium; and, optionally, purifying the 
polypeptide. 
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In another aspect, the invention provides methods of predicting whether an 
individual is at risk for a condition featuring inappropriate cell division, e.g., cancer. 
One such method comprises the step of determining whether there exists, within a cell 
and/or tissue of an individual, a mutation in a gene encoding a polypeptide having an 
5 amino acid sequence selected from the group consisting of SEQ ID NO:l, or a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO:l, or 
whether there exists, within a cell and/or tissue of an individual, a mutation in a 
regulatory sequence for such a gene. A second such method comprises the step of 
determining whether an individual expresses a particular allele or variant of a gene 

10 comprising a nucleotide sequence set forth in SEQ ID NO:2, SEQ ID NO:3, SEQ ED 
NO:4, or SEQ ID NO:5, such allele or variant being present within the general 
population and inherited from either of the individual's parents rather than 
constituting a de novo mutation within the organism. A third such method comprises 
the step of determining whether there exists, within a cell and/or tissue of an 

15 individual, inappropriate expression of a polynucleotide encoding a polypeptide 
having an amino acid sequence set forth in SEQ ID NO:l or a polypeptide having 
significant similarity to the polypeptide of SEQ ID NO: 1 . A fourth such method 
comprises the step of determining whether there exists, within a cell and/or tissue of 
an individual, inappropriate expression of a polypeptide having an amino acid 

20 sequence set forth in SEQ ID NO:l or a polypeptide having significant similarity to 
the polypeptide of SEQ IDNO:l. 

In another aspect, the invention provides methods of classifying a disease, 
particularly of classifying tumors. One such method includes steps of: (a) obtaining 
cells or tissue from a site of disease; (b) detecting a polynucleotide encoding a 

25 polypeptide having a sequence selected from the group consisting of SEQ ID NO: 1 or 
a polypeptide having significant similarity to the polypeptide of SEQ ID NO:l, or a 
complement of such a polynucleotide; and (c) assigning the disease to one of a set of 
predetermined categories based on detection of the polynucleotide. The method can 
further comprise the step of providing diagnostic, prognostic, or predictive 

30 information based on the category assigned in the assigning step. 

In certain embodiments of the invention the method is used to classify breast 
tumors. In certain embodiments of the invention the method is based on detecting 
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expression of a single gene, e.g., a gene encoding the polypeptide of SEQ ID NO:l . 
Detecting expression of such a gene may comprise detecting the polynucleotide 
sequence set forth in SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5. 
Detecting expression may comprise measuring either a relative or absolute level of 
5 expression. In certain embodiments of the invention the method is based on an 
assessment of the expression of multiple polynucleotides as described forther in 
copending U.S. Patent Application "Reagents and Methods for Use in Managing 
Breast Cancer, filed July 26, 2001 and in U.S. Provisional Patent Application Ser. No. 
60/220,967, filed July 26, 2000. These applications are referred to herein as "the gene 

1 0 subset application* The multiple polynucleotides can include all of the 
polynucleotides disclosed therein or any subset thereof. 

In another aspect, the invention provides a method of classifying a tumor by 
detection of the polypeptide of SEQ ID NO: 1 or a polypeptide having significant 
similarity to the polypeptide of SEQ ID NO:l in cells and/or tissue samples obtained 

15 from the tumor or from elsewhere in the body (e.g., in blood, urine, ascites, other 
body fluids or secretions or excretions). In a preferred embodiment the method is 
used to classify breast tumors. In certain embodiments the method is based on the 
detection of a single polypeptide, e.g., the polypeptide of SEQ ID NO:l or a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO: 1 . 

20 However, in other embodiments the method is based on an assessment of the 

expression of multiple polypeptides. The multiple polypeptides can include all of the 
polypeptides disclosed in the gene subset application mentioned above or any subset 
thereof. In certain embodiments the level of expression (either relative or absolute) of 
the polypeptide is measured. In other embodiments the pattern of expression of the 

25 polypeptide within cells or within a tissue sample is assessed. Regardless of the 
method by which a tumor is assigned to a predetermined category, the assignment 
may be used as a basis to provide diagnostic, prognostic, and/or predictive 
information to the patient having the tumor. 

In another aspect, the invention provides a microarray for use in classifying * 

30 tumors, comprising a polynucleotide whose sequence comprises or is complementary 
to that set forth in SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5, or 
whose sequence is of sufficient length to specifically bind to such a sequence under 
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the microarray hybridization conditions employed. Such conditions may be those 
described in the Examples, or any conditions appropriate for the particular microarray 
and detection technology employed. The invention further provides a microarray for 
use in classifying tumors comprising a polynucleotide, e.g., a cDNA or an 
5 oligonucleotide, capable of binding specifically to a polynucleotide encoding the 
polypeptide having the amino acid sequence set forth in SEQ ID NO: 1, or capable of 
binding specifically to a polypeptide having significant similarity to the polypeptide 
of SEQ ID NO:l, or complementary to such polynucleotides. 

In another aspect, the invention provides a pharmaceutical composition 

1 0 comprising a substantially purified polypeptide having the amino acid sequence set 
forth in SEQ ID NO:l or a polypeptide having significant similarity to the polypeptide 
of SEQ ID NO:l. In certain embodiments of the invention the pharmaceutical 
composition further preferably comprises a pharmaceutical^ acceptable carrier. 
In yet another aspect, the invention provides a pharmaceutical composition 

15 comprising a substantially purified antibody that binds to a polypeptide having an 
amino acid sequence set forth in SEQ ID NO:l or a polypeptide having significant 
similarity to the polypeptide of SEQ ID NO:l. In certain embodiments of the 
invention the pharmaceutical composition further preferably comprises a 
pharmaceutically acceptable carrier. In certain embodiments the antibody is modified, 

20 e.g., by attaching a toxic molecule thereto. 

The invention provides methods for identifying modulators of the expression 
or activity of a polypeptide comprising the amino acid sequence set forth in SEQ ID 
NO:l or a polypeptide having significant similarity to the polypeptide of SEQ ID 
NO:l. The invention further provides agonists and antagonists capable of modulating 

25 the expression or activity of a polypeptide comprising the amino acid sequence set 
forth in SEQ ID NO:l or a polypeptide having significant similarity to the polypeptide 
of SEQ ID NO: 1. The invention provides pharmaceutical compositions including 
such modulators and methods of use thereof for the treatment or prevention of cancer, 
particularly breast cancer. 

30 In another aspect, the invention provides a method for the treatment or 

prevention of cancer comprising the step of administering to an individual in need 
thereof, a pharmaceutical composition comprising a polypeptide having an amino acid 
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sequence comprising the sequence of SEQ ID NO: 1 or a polypeptide having 
significant similarity to the polypeptide of SEQ ID NO: 1 . In another aspect, the 
invention provides a method for the treatment or prevention of cancer comprising the 
step of administering to an individual in need thereof, a pharmaceutical composition 
5 comprising an antibody or a modified antibody that binds to a polypeptide having an 
amino acid sequence set forth in SEQ ID NO: 1, or a polypeptide having significant 
similarity to the polypeptide of SEQ ID NO: 1 . 

In another aspect, the invention provides a method of treating or preventing a 
tumor comprising steps of: (i) providing an individual in need of treatment or 

10 prevention of a tumor, (ii) administering a compound that enhances the level or 
activity of a polypeptide comprising the amino acid sequence of SEQ ID NO: 1 or a 
polypeptide having significant similarity to the polypeptide of SEQ ID NO: 1 . In 
certain embodiments of the invention the compound is provided as a component of a 
pharmaceutical composition. The invention also includes such a pharmaceutical 

15 composition. 

In another aspect, the invention provides methods of inhibiting growth of a 
cell comprising enhancing the level or activity of a polypeptide comprising the amino 
acid sequence of SEQ ID NO:l or a polypeptide having significant similarity to the 
polypeptide of SEQ ID NO:l in the cell. The cell can be a normal cell or a tumor cell, 

20 e.g., a breast tumor cell. According to certain of the methods the level of the 
polypeptide is enhanced by overexpressing the polypeptide in the cell, e.g., by 
introducing an expression vector or other nucleotide sequence (e.g., DNA, UNA, or 
modified nucleotides, etc.) that encodes the polypeptide into the cell. 

In another aspect, the invention provides a method of increasing cell growth 

25 comprising: 

decreasing the level or activity of a polypeptide comprising the amino acid sequence 
of SEQ ID NO: 1 or a polypeptide having significant similarity to the polypeptide of 
SEQ ID NO:l in the cell. Various methods for decreasing the level of a polypeptide 
in a cell are known in the art. Such methods include, for example, the introduction or 
30 expression of antisense nucleic acids or double-stranded RNA (RNA-mediated 
interference) into the cell. Another such method is introduction or expression of 
dominant negative polypeptides into the cell. 



12 



WO 02/08260 



PCT/US01/23439 



According to one aspect, the invention provides a method of classifying a 
tumor comprising the steps of (i) providing a tumor sample, (ii) detecting expression 
or activity of a gene encoding the polypeptide of SEQ ED NO:l in the sample; and (iii) 
classifying the tumor as belonging to a tumor subclass based on the results of the 
5 detecting step. The detecting step may comprise detecting the polypeptide. A variety 
of detection techniques may be employed including, but not limited to, 
immunohistochemical analysis, ELISA assay, antibody arrays, or detecting 
modification of a substrate by the polypeptide. 

In certain embodiments of the methods the tumor is a breast tumor and the 

1 0 tumor subclass is a luminal tumor subclass. The methods may further comprise 
providing diagnostic, prognostic, or predictive information based on the classifying 
step. Classifying may include stratifying the tumor (and thus stratifying a subject 
having the tumor), e.g., for a clinical trial. The methods may further comprise 
selecting a treatment based on the classifying step. In clinical research, stratification is 

1 5 the process or result of describing or separating a patient population into more 
homogeneous subpopulations according to specified criteria. Stratifying patients 
initially rather than after the trial is frequently preferred, e.g., by regulatory agencies 
such as the U.S. Food and Drug Administration that may be involved in the approval 
process for a medication. In some cases stratification may be required by the study 

20 design. Various stratification criteria may be employed in conjunction with detection 
of expression of one or more basal marker genes. Commonly used criteria include 
age, family history, lymph node status, tumor size, tumor grade, etc. Other criteria 
including, but not limited to, tumor aggressiveness, prior therapy received by the 
patient, ER and/or PR positivity, Her2neu status, p53 status, various other biomarkers, 

25 etc., may also be used. Stratification is frequently useful in performing statistical 
analysis of the results of a trial. 

In another aspect, the invention provides a method of testing a subject 
comprising the steps of (i) providing a sample isolated from a subject, (ii) detecting 
expression or activity of a gene encoding the polypeptide of SEQ ID NO:l in the 

30 sample, and (iii) providing diagnostic, prognostic, or predictive information based on 
the detecting step. The detecting step may comprise detecting the polypeptide. 
Detection may be performed using any appropriate technique including, but not 



13 



WO 02/08260 



PCMJS01/23439 



limited to, immunohistochemistry, ELISA assay, protein array, or detecting 
modification of a substrate by the polypeptide. 

The sample may comprise mRNA, in which case the detecting step may 
comprise hybridizing the mRNA or cDNA or RNA synthesized from the mRNA to a 
5 microarray or detecting mRNA transcribed from the gene or detecting cDNA or RNA 
synthesized from mRNA transcribed from the gene. In any of the above methods, the 
sample may be a blood sample, a urine sample, a serum sample, an ascites sample, a 
saliva sample, a cell, and a portion of tissue. 

According to another aspect, the invention provides a method of testing a 

1 0 compound or a combination of compounds for activity against tumors comprising 
steps of (i) obtaining or providing tumor samples taken from subjects who have been 
treated with the compound or combination of compounds, wherein the tumors fell 
within a tumor subclass, (ii) comparing the response rate of tumors that fell within the 
tumor subclass and have been treated with the compound with the overall response 

1 5 rate of tumors that have been treated with the compound or combination of 

compounds or with the response rate of tumors that do not fall within the subclass and 
have been treated with the compound or combination of compounds and (iii) 
identifying the compound or combination of compounds as having selective activity 
against tumors in the tumor subclass if the response rate of tumors in the subclass is 

20 greater than the overall response rate or the response rate of tumors that do not fall 
within the subclass. In certain embodiments of the invention the tumors are breast 
tumors. In certain embodiments of the invention the tumor subclass is a luminal 
tumor subclass. The tumors may be classified according to any of the inventive 
classification methods described above. In certain embodiments of the invention the 

25 classification is based on expression of the polypeptide of SEQ ID NO: 1 . 

The invention further provides a method of testing a compound or a 
combination of compounds for activity against tumors comprising steps of (i) treating 
subjects in need of treatment for tumors with the compound or combination of 
compounds, (ii) comparing the response rate of tumors that fell within a tumor 

30 subclass with the overall response rate of tumors or with the response rate of tumors 
that do not fall within the subclass, and (iii) identifying the compound or combination 
of compounds as having selective activity against tumors in the tumor subclass if the 
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response rate of tumors in the subclass is greater than the overall response rate or the 
response rate of tumors that do not fall within the subclass. The method may further 
comprise various additional steps. For example, the method may comprise steps of (i) 
providing tumor samples from subjects in need of treatment for tumors, (ii) 
5 determining whether the tumors fall within a tumor subclass, and (iii) stratifying the 
subjects based on the results of the determining step prior to performing the treating 
step. The method may further comprise the steps of (i) providing tumor samples from 
subjects in need of treatment for tumors, (ii) detecting expression or activity of a gene 
encoding the polypeptide of SEQ ID NO: 1 in the samples, and (iii) stratifying the 
10 subjects based on the results of the detecting step prior to performing the treating step. 

In addition, the invention includes a method of testing a compound or a 
combination of compounds for activity against tumors comprising steps of (i) treating 
subjects in need of treatment for tumors with the compound or combination of 

1 5 compounds or with an alternate compound, wherein the tumors fall within a tumor 
subclass, (ii) comparing the response rate of tumors treated with the compound or 
combination of compounds with the response rate of tumors treated with the alternate 
compound; and (iii) identifying the compound or combination of compounds as 
having superior activity against tumors in the tumor subclass, as compared with the 

20 alternate compound, if the response rate of tumors treated with the compound or 

combination of compounds is greater than the response rate of tumors treated with the 
alternate compound. The method may further comprise various additional steps. For 
example, the method may comprise steps of (i) providing tumor samples from 
subjects in need of treatment for tumors, (ii) determining whether the tumors fall 

25 within a tumor subclass, and (iii) stratifying the subjects based on the results of the 
determining step prior to performing the treating step. The method may further 
comprise the steps of (i) providing tumor samples from subjects in need of treatment 
for tumors, (ii) detecting expression or activity of a gene encoding the polypeptide of 
SEQ ID NO:l in the samples, and (iii) stratifying the subjects based on the results of 

30 the detecting step prior to performing the treating step. 

In certain embodiments of the invention the alternate compound is a 
compound approved by the U.S. Food and Drug administration for treatment of 
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tumors. The invention also provides a method of treating a subject comprising steps 
of (i) identifying a subject as having a tumor in a luminal tumor subclass, and (ii) 
administering to the subject a compound identified according to any of the inventive 
methods for identifying a subject. 

5 

BRIEF DESCRIPTION OF THE DRAWING 

Figure 1A presents the sequence of BSTP-ECG1 (SEQ ID NO:l). 
Figure IB presents the polynucleotide sequence of an open reading frame that encodes 
10 BS1P-ECG1 (SEQ ID NO: 2) 

Figures 1C and ID present the sequences of two polynucleotides that encode BSTP- 
ECG1 (SEQ ID NO:3 and SEQ ID NO: 4). These polynucleotides are cDNAs that 
represent multiple mRNA isoforms arising due to alternate 3' polyadenylation sites. 

15 Figure 2 presents the consensus sequence derived from LM.A.GJE. clones 161484, 
48805, 1276329, 1343900, and 1560906 (SEQ ID NO: 5) 

Figure 3 presents an alignment of BSTP-ECG1 with a number of related proteins 
identified in GenBank. 

20 

Figure 4 presents a sequence map in which the predicted transmembrane domain of 
BSTP-ECG1 (amino acids 66 - 1 15) is highlighted in gray. 

Figure 5A presents a Kyte-Doolittle hydrophobicity plot for BSTP-ECGL 

Figure 5B presents a prediction of transmembrane regions and orientation for BSTP- 
ECG1 obtained using the program TMpred. 

Figure 5C presents a prediction of transmembrane helices for BSTP-ECG1 produced 
30 using a Hidden Markov Model. 
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Figure 6A presents a Northern blot showing expression of BST-ECG1 in various cell 
lines. 

Figure 6B presents a longer exposure of the image in Figure 6A. 

5 BRIEF DESCRIPTION OF THE TABLES 

The tables contain the numerical data corresponding to microarray images. Other 
tables provide additional information or list the individual genes in the various gene 
subsets. 

10 Table 1 is a master data table for the 65 microarray experiments performed on 
individual tumor samples, in which rows represent I.M-A.G.E. clones that identify 
approximately 1753 genes whose expression varied by at least a factor of 4 and 
columns represent individual microarray experiments. The first 50 pages of the table 
consist of a reference list in which a descriptive name for each clone (where such a 

15 name exists) appears in the column entitled Name, followed by the Genbank 

accession number for the clone. Each row in the reference list contains a number in 
the first column that numerically identifies the column. In the subsequent data portion 
of the table (pages 1 - 392), each row is similarly identified by a number in the first 
column so that the name and Genbank accession number for the clone for which data 

20 appears in that row may be determined by consulting the reference list. In the data 
portion of the table, the column headings in the first row identify the tumor samples. 
Each data cell in the table represents the measured Cy5/Cy3 fluorescence ratio at the 
corresponding target element on the appropriate array. Empty cells indicate 
insufficient or missing data. All ratio values are log transformed (base 2) to treat 

25 inductions or repressions of identical magnitude as numerically equal but with 
opposite sign. 

Table 2 is a master data table for the 19 microarray experiments performed on cell line 
samples, in which rows represent I.MA.G.E. clones that identify approximately 1753 
30 genes whose expression varied by at least a factor of 4 and columns represent 

individual microarray experiments. This table contains only a data portion, in which 
the column headings in the first row identify the cell lines. Each row in the table is 
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identified by a number which appears in the first column. The same reference list that 
forms part of Table 1 may be consulted to determine the name and Genbank accession 
number for the clone for which data appears in that row- Each data cell in the table 
represents the measured Cy5/Cy3 fluorescence ratio at the corresponding target 
5 element on the appropriate array. Empty cells indicate insufficient or missing data. 
All ratio values are log transformed (base 2) to treat inductions or repressions of 
identical magnitude as numerically equal but with opposite sign. 

Table 3 presents a listing and description of the 1 1 cell lines used to create the 
1 0 common reference sample. 

Table 4 presents a complete listing of the 84 experimental samples that were assayed 
versus the common reference sample. The table includes a list of alternate names (in 
the column entitled Sample ID/old name) for die same tumors. The alternate names 
15 are used to identify the tumor samples in certain contexts, and the table allows 
conversion between the two sets of names. 

Table 5 lists the tumors used in the experiments described herein, along with clinical 
and pathological information about each tumor/patient. 

20 

Table 6 is a master data table for the 84 microarray experiments performed on 
individual tumor, tissue, and cell line samples, in which rows represent I.M.A.G.E. 
clones that identify the 496 genes in the intrinsic gene set, and columns represent 
individual microarray experiments. The first 15 pages of the table consist of a 

25 reference list in which a descriptive name for each clone (where such a name exists) 
appears in the column entitled Name, followed by the Genbank accession number for 
the clone. Each row in the reference list contains a number in the first column that 
numerically identifies the column. In the subsequent data portion of the table (pages 1 
- 91), each row is similarly identified by a number in the first column so that the 

30 name and Genbank accession number for the clone for which data appears in that row 
may be determined by consulting the reference list In the data portion of the table, 
the column headings in the first row identify the tumor samples. Each data cell in the 
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table represents the measured Cy5/Cy3 fluorescence ratio at the corresponding target 
element on the appropriate array. Empty cells indicate insufficient or missing data. 
All ratio values are log transformed (base 2) to treat inductions or repressions of 
identical magnitude as numerically equal but with opposite sign. 

5 

Table 7 is a listing of the 374 clones that identify genes selected for the epithelial 
enriched gene set including Genbank accession numbers. 

Table 8 is a listing of the clones that identify genes that comprise the luminal subset 
1 0 including Genbank accession numbers. 

Tables 9-1 and 9-2 are listings of the two groups of clones that identify genes that 
comprise the basal subset including Genbank accession numbers. 

1 5 Table 1 0 is a listing of the clones that identify genes that comprise the ErbB2 subset 
including Genbank accession numbers. 

Table 1 1 is a listing of the clones that identify genes that comprise the endothelial 
gene subset including Genbank accession numbers. 

20 

Table 12 is a listing of the clones that identify genes that comprise the 
stromal/fibroblast gene subset including Genbank accession numbers. 

Table 13 is a listing of the clones that identify genes that comprise the B-cell gene 
25 subset including Genbank accession numbers. 

Table 14 is a listing of the clones that identify genes that comprise the adipose- 
enriched/normal breast gene subset including Genbank accession numbers. 

30 Table 15 is a listing of the clones that identify genes that comprise the macrophage 
gene subset including Genbank accession numbers. 
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Table 16 is a listing of the clones that identify genes that comprise the T-cell gene 
subset including Genbank accession numbers. 

In Table 1, the Genbank accession number for each clone appears in the column 
5 entitled "Name" following a brief descriptive name for the gene identified by the 
clone, where available. In some cases the descriptive name is a number corresponding 
to an LM A.G.E. clone ID number. As is well known and accepted in the art, the 
Genbank accession number represents a means of definitively identifying a particular 
clone, since Genbank accession numbers will be maintained permanently or, if 
1 0 changed, the change will be accomplished in such a manner as to allow unambiguous 
correlation hetween any new numbering system and the numbering system currently 
in use. 

Note that Tables 1, 2, and 6 are provided for purposes of presenting the clone 
identifications and the data that was used to perform hierarchical clustering analysis, 
1 5 and that the format of the tables may not correspond exactly with the format required 
by software developed for the analysis of the data. Appropriate format will, in 
general, depend upon the particular computer program. See, for example, the Web 
site http://genome-www.stanford.eduA-sherIock/tutorial Jitml for discussion of the 
appropriate format for one particular analysis program. 

20 

In Tables 7-16, each entry identifies a clone. The first portion of each entry is a 
brief descriptive name for the gene identified by the clone. The Genbank accession 
number for the clone appears on the last line of the entry for that clone. 

25 DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS 

DEFINITIONS 

To facilitate understanding of the description of the invention, the following 
definitions are provided. It is to be understood that, in general, terms not otherwise 
defined are to be given their meaning or meanings as generally accepted in the art. 

30 

Agonist As used herein, the term "agonist" refers to a molecule that increases or 
prolongs the duration of the effect of a polypeptide or a nucleic acid. Agonists may 
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include proteins, nucleic acids, carbohydrates, lipids, small molecules, ions, or any 
other molecules that modulate the effect of the polypeptide or nucleic acid. An 
agonist may be a direct agonist, in which case it is a molecule that exerts its effect by 
binding to the polypeptide or nucleic acid, or an indirect agonist, in which case it 
5 exerts its effect via a mechanism other than binding to the polypeptide or nucleic acid 
(e.g., by altering expression or stability of the polypeptide or nucleic acid, by altering 
the expression or activity of a target of the polypeptide or nucleic acid, by interacting 
with an intermediate in a pathway involving the polypeptide or nucleic acid, etc.) 

1 0 Antagonist: As used herein, the term "antagonist" refers to a molecule that decreases 
or reduces the duration of the effect of a polypeptide or a nucleic acid Antagonists 
may include proteins, nucleic acids, carbohydrates, or any other molecules that 
modulate the effect of the polypeptide or nucleic acid. An antagonist may be a direct 
antagonist, in which case it is a molecule that exerts its effect by binding to the 

15 polypeptide or nucleic acid, or an indirect antagonist, in which case it exerts its effect 
via a mechanism other than binding to the polypeptide or nucleic acid (e.g., by 
altering expression or stability of the polypeptide or nucleic acid, by altering the 
expression or activity of a target of the polypeptide or nucleic acid, by interacting with 
an intermediate in a pathway involving the polypeptide or nucleic acid, etc.) 

20 

Corresponding to: In general, the phrase "corresponding to" has its commonly 
accepted meaning indicating, typically, a relationship between two entities, etc. For 
example, a mRNA corresponds to a gene if the mRNA is transcribed from the gene. 
A protein corresponds to a gene if the protein is translated from an mRNA transcribed 

25 from the gene. A cDNA corresponds to an mRNA if the cDNA is synthesized by 
reverse transcription of the mRNA. In addition, and without limitation, as used 
herein, an mRNA corresponds to a clone on a microarray when the mRNA (or cDNA 
derived therefrom) hybridizes specifically (under the experimental conditions 
described) to the clone or to its complement, e.g., when the sequence of the mRNA 

30 (or cDNA derived therefrom) and the sequence of the clone are sufficiently 

complementary to one another for specific hybridization to occur. Similarly, a gene 
corresponds to a clone on a microarray when mRNA transcribed from the gene 
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corresponds to the clone. Note that it is not necessary that the entire mRNA, cDNA, 
etc. hybridize with the clone or vice versa. For example, the mRNA or cDNA may be 
longer or shorter than the clone. The clone may be longer or shorter than the mRNA 
or cDNA. Either or both of the mRNA/cDNA and the clone may contain one or more 
5 stretches of sequence that is/are not contained within the corresponding nucleic acid. 

Diagnostic information: As used herein, diagnostic information or information for 
use in diagnosis is any information that is useful in determining whether a patient has 
a disease or condition and/or in classifying the disease or condition into a phenotypic 

10 category or any category having significance with regards to the prognosis of or likely 
response to treatment (either treatment in general or any particular treatment) of the 
disease or condition. Similarly, diagnosis refers to providing any type of diagnostic 
information, including, but not limited to, whether a subject is likely to have a 
condition (such as a tumor), information related to the nature or classification of a 

15 tumor, information related to prognosis and/or information useful in selecting an 
appropriate treatment. Selection of treatment may include the choice of a particular 
chemotherapeutic agent or other treatment modality such as surgery, radiation, etc., a 
choice about whether to withhold or deliver therapy, etc. 

20 Differential expression: A gene exhibits differential expression at the RNA level if its 
RNA transcript varies in abundance between different samples in a sample set A 
gene exhibits differential expression at the protein level, if a polypeptide encoded by 
the gene varies in abundance between different samples in a sample set. In the 
context of a microarray experiment, differential expression generally refers to 

25 differential expression at the RNA level. 

Gene: For the purposes of the present invention, the term "gene" has its meaning as 
understood in the art. However, it will be appreciated by those of ordinary skill in the 
art that the term "gene" has a variety of meanings in the art, some of which include 
30 gene regulatory sequences (e.g., promoters, enhancers, etc.) and/or intron sequences, 
3* untranslated regions, etc., and others of which are limited to coding sequences. It 
will further be appreciated that definitions of "gene" include references to nucleic 
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acids that do not encode proteins but rather encode functional KNA molecules such as 
tRNAs. For the purpose of clarity we note that, as used in the present application, the 
term "gene" generally refers to a portion of a nucleic acid that encodes a protein; the 
term may optionally encompass regulatory sequences. This definition is not intended 
5 to exclude application of the term "gene" to non-protein coding expression units but 
rather to clarify that, in most cases, the term as used in this document refers to a 
protein coding nucleic acid. 

Gene product or expression product: A gene product or expression product is, in 
1 0 general, an RNA transcribed from the gene or a polypeptide encoded by an RNA 
transcribed from the gene. 

Homology: The term "homology" refers to a degree of similarity between two or 
more nucleic acid sequences or between two or more amino acid sequences. As is 

1 5 well known in the art, given any nucleotide or amino acid sequence, homologous 
sequences may be identified by searching databases (e.g., GenBank, EST [expressed 
sequence tagjdatabases, GST [gene sequence tag] databases, GSS [genome survey 
sequence] databases, organism sequencing project databases) using computer 
programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, 

20 and PSI-BLAST for amino acid sequences. These programs are described in Altschul, 
SF, et al., Basic local alignment search tool, J. Mol Biol, 215(3): 403-410, 1990, 
Altchul, SF and Gish, W, Methods in Ertzymology, and Altschul, SF, et al., "Gapped 
BLAST and PSI-BLAST: a new generation of protein database search programs", 
Nucleic Acids Res. 25:3389-3402, 1997. In addition to identifying homologous 

25 sequences, the programs mentioned above typically provide an indication of the 
degree of homology. Determining the degree of identity or homology that exists 
between two or more amino acid sequences or between two or more nucleotide 
sequences can also be conveniently performed using any of a variety of other 
algorithms and computer programs known in the art. Discussion and sources of 

30 appropriate programs may be found, for example, in Baxevanis, A., and Ouellette, 
B.F.F., Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins, 
Wiley, 1998; and Misener, S. and Krawetz, S. (eds.), Bioinformatics Methods and 



23 



WO 02/08260 



PCT/IIS01/23439 



Protocols (Methods in Molecular Biology, Vol. 132), Humana Press, 1999. 

Operably linked: The term "operably linked" refers to a relationship between two 
nucleic acid sequences wherein the expression of one of the nucleic acid sequences is 
5 controlled by, regulated by, modulated by, etc. the other nucleic acid sequence. For 
example, a promoter is operably linked with a coding sequence if the promoter 
controls transcription of the coding sequence. Preferably a nucleic acid sequence that 
is operably linked to a second nucleic acid sequence is covalently linked, either 
directly or indirectly, to such a sequence, although any effective three-dimensional 
1 0 association is acceptable. 

Prognostic information and predictive information: As used herein the terms 
prognostic information and predictive information are used interchangeably and 
somewhat informally to refer to any information that may be used to foretell any 

1 5 aspect of the course of a disease or condition either in the absence or presence of 
treatment. Such information may include, but is not limited to, the average life 
expectancy of a patient, the likelihood that a patient will survive for a given amount of 
time (e.g., 6 months, 1 year, 5 years, etc.), the likelihood that a patient will be cured of 
a disease, the likelihood that a patient's disease will respond to a particular therapy 

20 (wherein response may be defined in any of a variety of ways). Prognostic and 
predictive information are included within the broad category of diagnostic 
information. 

Sample: As used herein, a sample obtained from a subject may include, but is not 
25 limited to, any or all of the following: a cell or cells, a portion of tissue, blood, serum, 
ascites, urine, saliva, and other body fluids, secretions, or excretions. The term 
''sample" also includes any material derived by processing such a sample. Derived 
samples may include nucleic acids or proteins extracted from the sample or obtained 
by subjecting the sample to techniques such as amplification or reverse transcription 
30 ofmRNA, etc. 
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Specific binding: As used herein, the term refers to an interaction between a target 
polypeptide (or, more generally, a target molecule) and a binding molecule such as an 
antibody, agonist, or antagonist. The interaction is typically dependent upon the 
presence of a particular structural feature of the target polypeptide such as an 
5 antigenic determinant or epitope recognized by the binding molecule. For example, if 
an antibody is specific for epitope A, the presence of a polypeptide containing epitope 
A or the presence of free unlabeled A in a reaction containing both free labeled A and 
the antibody thereto, will reduce the amount of labeled A that binds to the antibody. 
It is to be understood that specificity need not be absolute. For example, it is well 

10 known in the art that numerous antibodies cross-react with other epitopes in addition 
to those present in the target molecule. Such cross-reactivity may be acceptable 
depending upon the application for which the antibody is to be used. Thus the degree 
of specificity of an antibody will depend on the context in which it is being used. In 
general, an antibody exhibits specificity for a particular partner if it favors binding of 

15 that partner above binding of other potential partners. One of ordinary skill in the art 
will be able to select antibodies having a sufficient degree of specificity to perform 
appropriately in any given application (e.g., for detection of a target molecule, for 
therapeutic purposes, etc). It is also to be understood that specificity may be 
evaluated in the context of additional factors such as the affinity of the binding 

20 molecule for the target polypeptide versus the affinity of the binding molecule for 
other targets, e.g., competitors. If a binding molecule exhibits a high affinity for a 
target molecule that it is desired to detect and low affinity for nontarget molecules, the 
antibody will likely be an acceptable reagent for immunodiagnostic purposes. Once 
the specificity of a binding molecule is established in one or more contexts, it may be 

25 employed in other, preferably similar, contexts without necessarily re-evaluating its 
specificity. 

Treating a tumor: As used herein, treating a tumor is taken to mean treating a subject 
who has the tumor. 

30 

Tumor sample: The term "tumor sample" as used herein is taken broadly to include 
cell or tissue samples removed from a tumor, cells (or their progeny) derived from a 
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tumor that may be located elsewhere in the body (e.g., cells in the bloodstream or at a 
site of metastasis), or any material derived by processing such a sample. Derived 
tumor samples may include nucleic acids or proteins extracted from the sample or 
obtained by subjecting the sample to techniques such as amplification or reverse 
5 transcription of mRNA, etc. 

Tumor subclass: A tumor subclass, also referred to herein as a tumor subset or tumor 
class, is the group of tumors that display one or more phenotypic or genotypic 
characteristics that distinguish members of the group from other tumors. 

10 

Vector: A vector, as used herein, is a nucleic acid molecule that includes sequences, 
sufficient to direct in vivo or in vitro replication of the molecule. These may either be 
self-replication sequences or sequences sufficient to direct integration of the vector 
into another nucleic acid present in a cell (either an endogenous nucleic acid or one 

15 introduced into the cell by experimental manipulation), so that the vector sequences 
are replicated during replication of this nucleic acid. Preferred vectors include a 
cloning site, at which foreign nucleic acid molecules may be introduced. Vectors may 
include control sequences that have the ability to direct in vivo or in vitro expression 
of nucleic acid sequences introduced into the vector. Such control sequences may 

20 include, for example, transcriptional control sequences (e.g^ promoters, enhancers, 
terminators, etc.), splicing control sequences, translational control sequences, etc. 
Vectors may also include a coding sequence, so that transcription and translation of 
sequences introduced into the vector results in production of a fusion protein. 

25 I. Overview 

The invention is based on the identification of polynucleotides (cDNAs) 
corresponding to human genes that are differentially expressed in human breast tumor 
samples, the polypeptides encoded by these polynucleotides, and antibodies raised 
against these polypeptides. The invention encompasses the use of these 

30 polynucleotides, polypeptides, and antibodies as well as compositions containing 
them, either singly or in combination, in the prediction, diagnosis, treatment, or 
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prevention of cancer and in the provision of prognostic and predictive information 
related to cancer. 

Nucleic acids encoding BSTP-ECG1 were identified based on expression 
profiles gathered in a series of cDNA microarray experiments. As described in more 
5 detail in Examples 1, 2, and 4, cDNA microarrays each representing the same set of 
approximately 8100 different human genes were produced. The human cDNA clones 
used to produce the microarrays contained approximately 4000 named genes, 2000 
genes with homology to named genes in other species, and approximately 2000 ESTs 
of unknown function. An mRNA sample was obtained from each of a set of 84 tissue 

10 samples or cell lines. The expression levels of the approximately 8100 genes were 
measured in each mRNA sample by hybridization to an individual microarray, 
yielding an expression profile for each gene across the experimental samples. These 
expression profiles were studied and compared and were used to identify nucleic acids 
encoding BSTP-ECG1 . Although more details will be found in the Examples, a 

1 5 description of cDNA microarray technology and a description of the experimental 
approach employed in the identification of the polynucleotides that encode BSTP- 
ECG1 is presented here so that the invention may be better understood. Certain 
aspects of the invention are then described in detail. 

20 H cDNA Microarray Technology 

cDNA microarrays consist of multiple (usually thousands) of different cDNAs 
spotted (usually using a robotic spotting device) onto known locations on a solid 
support, such as a glass microscope slide. After spotting, the cDNAs are usually 
cross-linked to the support, e.g., by UV irradiation. The cDNAs are typically obtained 

25 by PCR amplification of plasmid library inserts using primers complementary to the 
vector backbone portion of the plasmid or to the gene itself for genes where sequence 
is known. PCR products suitable for production of microarrays are typically between 
0.5 and 2.5 kB in length. Full length cDNAs, expressed sequence tags (ESTs), or 
randomly chosen cDNAs from any library of interest can be chosen. ESTs are 

30 partially sequenced cDNAs as described, for example, in L. Hillier, et al., Generation 
and analysis of 
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280,000 human expressed sequence tags, Genome Research, 6, 807-828, 1996. The 
afore-mentioned article is herein incorporated by reference, as are the entire teachings 
of all other patents and journal articles mentioned herein, for all purposes and not just- 
those related to the particular context in which they are mentioned. The present 
5 application also incorporates by reference six U.S. patent applications filed by 
inventors on July 26, 2001. These applications are entitled "REAGENTS AND 
METHODS FOR USE IN MANAGING BREAST CANCER", "BSTP-RAS/RERG 
PROTEIN AND RELATED REAGENTS AND METHODS OF USE THEREOF", 
"BSTP-CAD PROTEIN AND RELATED REAGENTS AND METHODS OF USE 

10 THEREOF", "BASAL CELL MARKERS IN BREAST CANCER AND USES 
THEREOF", "BSTP-TRANS PROTEIN AND RELATED REAGENTS AND 
METHODS OF USE THEREOF", "BSTP-S PROTEINS AND RELATED 
REAGENTS AND METHODS OF USE THEREOF". 

Although some ESTs correspond to known genes, frequently very little or no 

1 5 information regarding any particular EST is available except for a small amount of 3' 
and/or 5' sequence and, possibly, the tissue of origin of the mRNA from which the 
EST was derived. As will be appreciated by one of ordinary skill in the art, in general 
the cDNAs contain sufficient sequence information to uniquely identify a gene within 
the human genome. Furthermore, in general the cDNAs are of sufficient length to 

20 hybridize specifically to cDNA obtained from mRNA derived from a single gene 
under the hybridization conditions of the experiment 

In a typical microarray experiment, a microarray is hybridized with 
differentially labeled RNA or DNA populations derived from two different samples. 
Most commonly RNA (either total RNA or poly A + RNA is isolated from cells or 

25 tissues of interest and is reverse transcribed to yield cDNA. Labeling is usually 
performed during reverse transcription by incorporating a labeled nucleotide in the 
reaction mixture. Although various labels can be used, most commonly the 
nucleotide is conjugated with the fluorescent dyes Cy3 or Cy5. For example, Cy5- 
dUTP and Cy3-dUTP can be used. cDNA derived from one sample (representing, for 

30 example, a particular cell type, tissue type or growth condition) is labeled with one 
fluor while cDNA derived from a second sample (representing, for example, a 
different cell type, tissue type, or growth condition) is labeled with the second fluor. 
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Similar amounts of labeled material from the two samples are cohybridized to the 
microarray. In the case of a microarray experiment in which the samples are labeled 
with Cy5 (which fluoresces red) and Cy3 (which fluoresces green), the primary data 
(obtained by scanning the microarray using a detector capable of quantitatively. 
5 detecting fluorescence intensity) are ratios of fluorescence intensity (red/green, R/G). 
These ratios represent the relative concentrations of cDNA molecules that hybridized 
tothecDNAs 

represented on the microarray and thus reflect the relative expression levels of the 
mRNA corresponding to each cDNA/gene represented on the microarray. 

1 0 Each microarray experiment can provide tens of thousands of data points, each 

representing the relative expression of a particular gene in the two samples. 
Appropriate organization and analysis of the data is of key importance. Various 
computer programs have been developed to facilitate data analysis. One basis for 
organizing gene expression data is to group genes with similar expression patterns 

1 5 together into clusters. A method for performing hierarchical cluster analysis and 
display of data derived from microarray experiments is described in Eisen, M., 
Spellman, P., Brown, P., and Botstein, D., Cluster analysis and display of genome- 
wide expression patterns, Proc. Natl Acad. Set USA, 95: 14863-14868, 1998. As 
described therein, clustering can be combined with a graphical representation of the 

20 primary data in which each data point is represented with a color that quantitatively 
and qualitatively represents that data point By converting the data from a large table 
of numbers into a visual format, this process facilitates an intuitive analysis of the 
data. Additional information and details regarding the mathematical tools and/or the 
clustering approach itself may be found, for example, in Sokal, R.R. & Sneath, 

25 P JLA. Principles of numerical taxonomy, xvi, 359, W. H. Freeman, San 

Francisco,1963; Hartigan, J.A. Clustering algorithms, xiii, 351, Wiley, New York, 
1975; Paull, K.D. et al. Display and analysis of patterns of differential activity of 
drugs against human tumor cell lines: development of mean graph and COMPARE 
algorithm. J Natl Cancer Inst 81, 1088-92,1989; Weinstein, LN. et al. Neural 

30 computing in cancer drug development: 

predicting mechanism of action. Science 258, 447-51, 1992; van Osdol, W.W., 
Myers, T.G., Paull, K.D., Kohn, K.W. & Weinstein, J.N. Use of the Kohonen self- 
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organizing map to study the mechanisms of action of chemotherapeutic agents. J Natl 
Cancer Inst 86, 1853-9, 1994; and Weinstein, JJSf. et al. An information-intensive 
approach to the molecular pharmacology of cancer. Science, 275, 343-9, 1997. 

Further details of the experimental methods used in the present invention are 
5 found in the Examples. Additional information describing methods for fabricating 
and using microarrays is found in U.S. Patent No. 5,807,522, which is herein 
incorporated by reference. Instructions for constructing microarray hardware (e.g., 
arrayers and scanners) using commercially available parts can be found at 
http://cmgm.stanford.edu/pbrown/ and in Cheung, V., Morley, M., Aguilar, F., 

10 Massimi, A., Kucherlapati, R., and Childs, G., Making and reading microarrays, 
Nature Genetics Supplement* 21:15-19, 1999, which are herein incorporated by 
reference. Additional discussions of microarray technology and protocols for 
preparing samples and performing microrarray experiments are found in, for example, 
DNA arrays for analysis of gene expression, Methods Enzymol, 303:179-205, 1999; 

15 Fluorescence-based expression monitoring using microarrays, Methods Enzymol, 306: 
3-18, 1999; and M. Schena (ed.), DNA Microarrays: A Practical Approach, Oxford 
University Press, Oxford, UK, 1999. Descriptions of how to use an arrayer and the 
associated software are found at 

http://cmgm.stanford.edu^^ which is 

20 herein incorporated by reference. 

in. Experimental Approach of the Invention 

The present invention encompasses the realization that genes that are 
differentially expressed in tumors are of use in tumor classification and are targets for 

25 the development of diagnostic and therapeutic agents. Differentially expressed genes 
are likely to be responsible for the different phenotypic characteristics of tumors. The 
present invention identifies one such gene. In general, a differentially expressed gene 
is a gene whose transcript abundance varies between different tumor samples. For 
example, and without intending to be limiting, the transcript level of a differentially 

30 expressed gene may vary by at least fourfold from its average abundance in a given 
sample set in at least 1 sample, at least two samples, at least three samples, etc. Of 
course other criteria for differential expression may be employed. 



30 



WO 02/08260 



PCT/US01/23439 



While analysis of multiple genes is of use in developing a robust classification 
of tumors, each of the differentially expressed genes and their encoded proteins is a 
target for the development of diagnostic and therapeutic agents. Investigation of 
variation in individual genes in breast tumors reveals that molecular variation can be 
5 related to important features of clinical variation. For example, expression of the 
estrogen receptor alpha gene (ESR1), the Erb~B2/HER2/neu oncogene (the target for 
the monoclonal antibody Herceptin® (Trastuzumab), a recently approved treatment for 
certain patients with metastatic breast cancer), and the mutational status at the TP53 9 
BRCA1 and BRCA2 loci have shown that molecular variation can be related to 

10 important features of clinical variation. (Discussed, for example, in Osborne, C.K., et 
al 9 The value of estrogen and progesterone receptors in the treatment of breast cancer, 
Cancer 46, 2884-2888, 1980; Ingvarsson, S., Molecular genetics of breast cancer 
progression, Seminars in Cancer Biology, 9, 277-288, 1999; Breast Cancer Linkage 
Consortium, Pathology of familial breast cancer: differences between breast cancers in 

1 5 carriers of BRCA1 and BRCA2 mutations and sporadic cases, Lancet, 349, 1 505-1 5 1 0, 
1997; Anderson, T. I., et aL, Prognostic significance of TP 53 alterations in breast 
carcinoma. Br J Cancer, 68, 540-548, 1993 and references cited in these articles). In 
particular, approximately 60% to 70% of breast tumors express the estrogen receptor, 
and this expression has been shown to be a favorable prognostic factor (reviewed in 

20 Allred, D.C., et al. Prognostic and Predictive Factors in Breast Cancer by 

Immunohistochemical Analysis, Modern Pathology, 11(2), 155-168, 1998). As these 
examples demonstrate, the study of genetic variation in breast cancer has the potential 
to contribute to improved classification, diagnosis, and therapy for patients. 

As described in more detail in Examples 1, 2, and 4, in order to identify genes 

25 that are differentially expressed in breast tumors, cDNA microarrays each 
representing the same set of approximately 81 00 different human genes were 
produced. The human cDNA clones used to produce the microarrays contained 
approximately 4000 named genes, 2000 genes with homology to named genes in other 
species, and approximately 2000 ESTs of unknown function. An mRNA sample was 

30 obtained from each of a set of 84 tissue samples or cell lines. The expression levels of 
the approximately 8100 genes were measured in each mRNA sample by hybridization 
to an individual microarray, yielding an expression profile for each gene across the 
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experimental samples. Although more details will be found in the Examples, an 
overview of the experimental procedure is presented here so that the invention may be 
better understood. 

Variation m patterns of gene expression were characterized in 62 breast tumor 

5 samples from 40 different patients, 3 normal breast tissue samples, and 1 9 samples 

o 

from 17 cultured human cell lines (one of which was sampled 3 times under different 
conditions). Twenty of the tumors had been sampled twice, before and after a 16 week 
course of doxorubicin chemotherapy, and two tumors were paired with a lymph node 
metastasis from the same patient. The other 1 8 tumor samples were single samples 

10 from individual tumors. A detailed listing of the tumor samples and various 

characteristics including clinical estrogen receptor and Erb-B2 status as assessed using 
antibody staining, estrogen receptor and Erb-B2 status as assessed by microarray 
result, tumor grade, differentiation, survival status and time, age at diagnosis, 
doxorubicin response, and p53 status is presented in Table 5 of the gene subset 

15 application. A listing of the cell lines including description and ATCC (American 
Tissue Culture Collection) number or reference is presented in Table 3 of the gene 
subset application. The cell lines provided a framework for interpreting the variation 
in gene expression patterns seen in the tumor samples and included gene expression 
models for many of the cell types encountered in tumors. 

20 As described in more detail in Example 2, mKNA was isolated from each 

sample. cDNA labeled with the fluorescent dye Cy5 was prepared from each 
experimental sample separately. Fluorescently labeled cDNA, labeled using a second 
distinguishable dye (Cy3), was prepared from a pool of mKNAs isolated from 1 1 
different cultured cell lines. The pooled mRNA sample served as a reference to 

25 provide a common internal standard against which each gene's expression in each 
experimental sample was measured. 

Comparative expression measurements were made by separately mixing Cy5- 
labeled experimental cDNA derived from each of the 84 samples with a portion of the 
Cy3-labeled reference cDNA, and hybridizing each mixture to an individual cDNA 

30 microarray. The ratio of Cy5 fluorescence to Cy3 fluorescence measured at each 
cDNA element on the microarray was then quantitatively measured. The use of a 
common reference standard in each hybridization allowed the fluorescence ratios to be 
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treated as comparative measurements of the expression level of each gene across all 
the experimental samples. 

A hierarchical clustering method (Eisen, et al 9 1998) was used to group genes 
based on similarity in the pattern with which their expression varied over all 
5 experimental samples. The same clustering method was used to group the 

experimental samples (tissue and cell lines separately) based on the similarity in their 
patterns of expression. Interpretation of the data obtained from the clustering 
algorithm was facilitated by displaying die data in the form of tumor and gene 
dendrograms. In the tumor dendrograms, the pattern and length of the branches 

1 0 reflects the relatedness of the tumor samples with respect to their expression of genes 
represented on the microarray. Microarray images, and tumor and gene dendrograms 
are available in Perou, et al, Nature, 2000, and at inventors' Web site (httpy/genome- 
www.stanford.eduAnolecularportraits/). The similarity of the gene expression profiles 
of individual tumor samples or groups of tumor samples to one another is inversely 

1 5 related to the length of the branches that connect them. Thus, for example, adjacent 
tumor samples connected to one another by short vertical branches descending from a 
common horizontal branch (e.g., tumor samples Norway 48-BE and Norway 48-AF 
close to the right of the tumor dendrogram) are more closely related to one another in 
terms of their gene expression profiles than adjacent tumor samples connected to one 

20 another by longer vertical branches descending from a common horizontal branch 

(e.g., tumor samples Norway 100-BE and Norway 100-AF at the left side of the tumor 
dendrogram). To the extent that the gene expression programs dictate the biological 
properties and behavior of the tumors and reflect their physiological state and 
environment, it is expected that the clustering of the tumors reflects phenotypic 

25 relationships among them, e.g., tumor samples connected by short horizontal branches 
(i.e., located in close proximity to one another) are expected to exhibit similar 
phenotypic features. In the gene dendrograms, the pattern and length of the branches 
reflects the relatedness of the genes with respect to their expression profiles across the 
tumor samples. Similarly to the tumor samples, genes connected by short vertical 

30 branches are more similar to one another in terms of expression profile than genes 
connected by longer vertical branches. 
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The expression patterns of the genes were also displayed using a matrix 
format, with each row representing all of the hybridization results for a single cDNA 
element on the array and each column representing the measured expression levels for 
all genes in a single sample. In this format, tumor samples with similar patterns of 
5 expression across the gene set are close to each other along the horizontal dimension. 
Similarly, genes with similar expression patterns across the set of samples are close to 
each other along the vertical dimension. To allow the patterns of expression to be 
visualized, the normalized expression value of each gene was represented by a colored 
box, using red to represent expression levels greater than the median and green to 

10 represent expression levels less than the median. In all array images the brightest red 
color represents transcript levels at least 16-fold greater than the median, and the 
brightest green color represents transcript levels at least 16-fold below the median. 
This display format facilitates comparisons between genes and the recognition of 
significant patterns. Certain gene subsets of particular interest are indicated by 

15 colored bars along the right side of the matrices. These subsets are discussed further 
in the gene set application. 

IV. Identification and characterization of sequences encoding BSTP-ECG1 

I.MA.G.E. clone 161484 was identified based on the expression pattern of its 

20 corresponding mRNA among the 84 samples analyzed by microarray hybridization. 
In particular, transcripts corresponding to clone 161484 varied in abundance by at 
least 4-fold from their median abundance in the sample set, among at least 3 of the 84 
samples. Thus the polynucleotide corresponding to clone 161484 was differentially 
expressed among the tumor samples, indicating its potential utility for classifying 

25 tumors. Numerical data indicating the measured expression of mRNA corresponding 
to clone 161484 (i.e., mRNA hybridizing to clone 161484) appear in Table 1 . In 
Table 1, information pertaining to clone 161484 is entered under the GenBank 
accession number of the clone, i.e., H25606. The complete entry on the referral page 
of Table 1 for clone 161484 is: ESTS, WEAKLY SIMILAR TO W01A1 12 GENE 

30 PRODUCT [C.ELEGANS] H25606. In the color matrix representations the 

expression profile for clone 161484 is identified with the label ESTS, WEAKLY 
SIMILAR TO W01A1 1.2 GENE PRODUCT [CJ2LEGANS]. The expression profile 
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and additional data related to clone 161484 are available at Applicants' Web site 
(http://genome-www.stanford.edu/moleculaiporlraits^ in Supplementary Figure 4. As 
shown therein, the expression level of mRNA corresponding to clone 161484 varied 
significantly among tumor samples. For example, mRNA corresponding to clone 
5 1 6 1484 was particularly highly expressed in tumor samples Norway 1 8-BE, Norway 
104-BE, Norway 1 12-BE, Norway 1 12-AF, and Stanford 14. The relative expression 
level was also high in tumors Norway 18-AF, Norway 12-AF, and Norway 26-BE, 
among others. The relative expression level of mRNA corresponding to clone 161484 
was particularly low in tumor samples Norway 27-BE and Norway 7-AF and was also 

10 low in tumor samples Norway 15-BE, Stanford 38-P, Stanford 38-LN, Norway 56, 
Norway 16, Stanford 24, Norway 27-AF, New York 1, Norway 39-AF, and Norway 
102-BE, among others. As is evident from the discussion above, the presence of 
mRNA corresponding to clone 161484 reflects the expression of the gene from which 
the mRNA is transcribed. 

15 A search of GenBank revealed that only a small portion at the 3 ' end and a 

small portion at the 5' end of clone 161484 had been sequenced. To confirm the 
identity of the clone actually used on the array, sequencing runs from the 3 1 and 5' 
ends of the clone were performed. As expected, the sequences obtained corresponded 
to the 3 f and 5' sequences in GenBank. Overlapping clones (I.MA.G.E. clones 48805, 

20 1276329, 1343900, and 1 560906) were identified by first searching GenBank for 
clones homologous to clone 161484 and then searching for additional clones 
homologous to the clones identified as homologous to clone 161484. A consensus 
nucleotide sequence (SEQ ID NO: 5) was derived based on sequencing and analysis 
of overlapping I.M.A.G.E. clones 161484, 48805, 1276329, 1343900, and 1560906. 

25 In SEQ ID NO: 5, the abbreviation N stands for any nucleotide. The consensus 
sequence (SEQ ID NO: 5) was used as input for a search of GenBank with the 
BLASTX program (which translates a nucleotide sequence in each of the possible six 
reading frames and then searches for homologous amino acid sequences). The search 
indicated that a portion of the translated amino acid sequence in one reading frame 

30 had homologs in a large number of eukaryotic species including C. elegans. An open 
reading frame (SEQ ID NO: 2) encoding this portion was identified within the 
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consensus sequence. The predicted amino acid sequence for the polypeptide encoded 
by this open reading frame is presented as SEQ ID NO: 1 . 

Thus in one embodiment, the invention encompasses a polypeptide comprising 
the amino acid sequence of SEQ ID NO: 1 . The polypeptide is 388 amino acids in 
5 length. A search of the GenBank database using the BLASTP computer program 
(Ahschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, 
Zheng Zhang, Webb Miller, and David J; Lipman (1997), "Gapped BLAST and PSI- 
BLAST: a new generation of protein database search programs", Nucleic Acids Res. 
25:3389-3402) performed with this sequence indicated that this polypeptide has at 

10 least one homolog in a large number of eukaryotic species, consistent with the 

information obtained from the BLASTX search described above. Due to the feet that 
the polypeptide comprising the amino acid sequence presented in SEQ ID NO:l is 
highly conserved across eukaryotes, the polypeptide will be referred to herein as 
BSTP-ECG1 (for Breast Protein - Eukaryotic Conserved Gene i, and the gene 

15 encoding BSTP-ECG1 will be referred to as BST-ECGL While not wishing to be 
bound by any theory, the fact that ECG1 is so highly conserved across eukaryotes 
may indicate that it performs an essential cellular function. Figure 3 presents an 
alignment of BSTP-ECG1 with a number of related proteins identified in GenBank, in 
which identical amino acids are shaded. GenBank accession numbers are listed before 

20 the name of each protein. BSTP-ECG1 is 40-37% identical to the other proteins in 
this alignment. 

Analysis of the BSTP-ECG1 coding sequence using three different techniques 
indicated the presence of a putative transmembrane domain between amino acids 66 
and 1 15. Figure 4 presents a sequence map of BSTP-ECG1 in which the predicted 

25 transmembrane domain is highlighted in gray. Figure 5A presents a Kyte-Doolittle 
hydrophobic^ plot for BSTP-ECG1 (Kyte J and Doolittle RF A Simple Method for 
Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology 
1 57(6): 105-142, 1 982). Figure 5B presents a prediction of transmembrane regions 
and orientation for BSTP-ECG1 obtained using the program TMpred (K. Hofmann & 

30 W. Stoffel (1 993) TMbase - A database of membrane spanning proteins segments. 
Biol Chem. Hoppe-Seyler 347,166). Figure 5C presents a prediction of 
transmembrane helices for BSTP-ECG1 produced using a hidden Markov model (Erik 
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LX. Sonnhammer, Gunnar von Heijne, and Anders Krogh: A hidden Markov model 
for predicting transmembrane helices in protein sequences. In Proc. of Sixth Int. Conf 
on Intelligent Systems for Molecular Biology, p 175-182 Ed J. Glasgow, T. Littlejohn, 
F. Major, R. Lathrop, D. Sankoff, and C. Sensen. Menlo Park, CA: AAAI Press, 
5 1998. 

Northern analysis (see Figures 6A and 6B and Example 10) revealed that BST- 
ECG1 mRNA can be detected in a variety of tumor-derived cell lines, including a 
breast adenocarcinoma cell line (MCF-7). While not wishing to be bound by any 
theory, this data suggests that in addition to being useful for the diagnosis and/or 

10 therapy of breast cancer, the methods and reagents described herein may be useful in 
the diagnosis and/or therapy of additional tumor types. Northern analysis confirmed 
the existence of multiple mRNA isoforms encoding BSTP-ECG1, which arise due to 
alternate 3 f polyadenylation sites. The Northern blot showed 2 bands of 
approximately 1 .5 and 2.2 kB. The sequence of a cDNA corresponding to the ~22 kB 

15 band is presented as SEQ ID NO:3 (2445 nucleotides). The sequence of a cDNA 
corresponding to the -1.5 kB band is presented as SEQ ID NO: 4 (1543 nucleotides). 
In both SEQ ID NO: 3 and SEQ ID NO: 4 the initiation codon begins at nucleotide 
228, and the stop codon begins at nucleotide 1392. 

The fact that BST-ECG1 is differentially expressed among breast tumors 

20 indicates that the expression level of BST-ECG1 and/or of its encoded polypeptide, 
BSTP-ECG1, can be used to distinguish between different subsets of breast tumors. 
For example, while not wishing to be bound by any theoiy, the expression level of 
BST-ECG! and/or BSTP-ECG1 may be used, either alone or in combination with 
other data, to distinguish between tumors falling into phenotypic categories such as a 

25 good prognosis category, a poor prognosis category, a nonresponder category (where 
a different nonresponder category may be defined with respect to each particular 
therapy), etc. The various categories may be defined in any of a variety of ways and 
need not be absolute. For example, a good prognosis category may be defined as a 
category including tumors for which the average survival of patients having tumors in 

30 the category is greater than 1 0 years. As another example, a nonresponder categoiy 
may be defined as a category including tumors for which the average response rate to 
a particular therapy is less than 5%, where a "response" can also be defined according 
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to criteria typically employed in studies such as clinical trials. The expression of BST- 
ECG1 may be used to provide prognostic information for breast cancer patients and/or 
in the selection of appropriate therapy. 

Determining the expression of BST-ECG1 may include measuring mRNA 
5 transcribed from BST-ECG1, e.g., mRNA comprising a nucleotide sequence set forth 
in SEQ ID NO:2 or 3 or variants thereof Determining the expression of BST-ECG1 
may include detecting (qualitatively or quantitatively) a translation product of BST- 
ECG1, e.g., a polypeptide comprising an amino acid sequence set forth in SEQ ID 
NO: 1 or a variant thereof 

1 0 The discovery of BSTP-ECG1 and the discovery of the differential expression 

of the gene encoding BSTP-ECG1 satisfy a need in the art by providing compositions 
useful in the diagnosis, treatment, and prevention of cancer, particularly breast cancer, 
and by providing methods useful in the classification of cancer and the provision of 
prognostic information to patients with cancer. Furthermore, BSTP-ECG1 is likely to 

15 be a transmembrane protein, indicating that it will likely be accessible to therapeutic 
agents such as antibodies and/or small molecules. These results suggest that BST- 
ECG1 may be a useful gene to target for therapeutic intervention in subsets of breast 
cancer and a useful gene to distinguish between different subsets of breast cancer. 

20 V. Further Aspects of the Invention 

A. Polynucleotides, polypeptides, antibodies, vectors, and host cells 

The invention encompasses a polypeptide whose amino acid sequence has or 
comprises the sequence set forth in SEQ ID NO: 1 . The invention also encompasses 
polypeptides possessing significant similarity to BSTP-ecgl, i.e., polypeptides whose 

25 sequence possesses signficant similarity to the sequence of SEQ ID NO:l . In certain 
embodiments of the invention a significantly similar polypeptide has one or more 
amino acid substitutions, deletions, and/or additions with respect to the sequence of 
SEQ ID NO: 1 . In certain embodiments of the invention a significantly similar 
polypeptide is encoded by a human gene. Definitions of "significantly similar" make 

30 reference to the BLAST algorithm and BLOSUM substitution matrix, which are 

described in Altschul, SF, et al., "Gapped BLAST and PSI-BLAST: a new generation 
of protein database search programs", Nucleic Acids Res. 25:3389-3402, 1997 and 
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Henikoff, S. and Henikof£ J., "Amino acid substitution matrices from protein blocks", 
Proc. Natl Acad. Set 89, 10915-10919, 1992. 

In certain embodiments of the invention a polypeptide is considered 
significantly similar it when the amino acid sequence of the polypeptide is compared 
5 with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the BLAST 
algorithm and the BLOSUM substitution matrix with default parameters, the result is 
a % identity greater than 60 or a % positive greater than 70 encompassing at least 25% 
of the length of SEQ ID NO:l, or both. In other embodiments of the invention a 
polypeptide is considered significantly similar if, when the amino acid sequence of the 

10 polypeptide is compared with the amino acid sequence of the polypeptide of SEQ ID 
NO:l using the BLAST algorithm and the BLOSUM substitution matrix with default 
parameters, the result is a % identity greater than 60 or a % positive greater than 70 
encompassing at least 50% of the length of SEQ ID NO:l 5 or both. In other 
embodiments of the invention a polypeptide is considered significantly similar if, 

1 5 when the amino acid sequence of the polypeptide is compared with the amino acid 
sequence of the polypeptide of SEQ ID NO: 1 using the BLAST algorithm and the 
BLOSUM substitution matrix with default parameters, the result is a % identity 
greater than 60 or a % positive greater than 70 encompassing at least 75% of the 
length of SEQ ID NO:l, or both. In other embodiments of the invention a polypeptide 

20 is considered significantly similar i£ when the amino acid sequence of the polypeptide 
is compared with the amino acid sequence of the polypeptide of SEQ ID NO:l using 
the BLAST algorithm and the BLOSUM substitution matrix with default parameters, 
the result is a % identity greater than 60 or a % positive greater than 70 encompassing 
at least 90% of the length of SEQ ID NO:l, or both. In other embodiments of the 

25 invention a polypeptide is considered significantly similar if, when the amino acid 
sequence of the polypeptide is compared with the amino acid sequence of the 
polypeptide of SEQ ID NO:l using the BLAST algorithm and the BLOSUM 
substitution matrix with default parameters, the result is a % identity greater than 60 
or a % positive greater than 70 encompassing at least 95% of the length of SEQ ID 

30 NO:l,orboth. 

In other embodiments of the invention a polypeptide is considered 
significantly similar if, when the amino acid sequence of the polypeptide is compared 
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with the amino acid sequence of the polypeptide of SEQ ID NO:l using the BLAST ' 
algorithm and the BLOSUM substitution matrix with default parameters, the result is 
a % identity greater than 70 or a % positive greater than 80 encompassing at least 25% 
of the length of SEQ ID NO:l, or both. In other embodiments of the invention a 
5 polypeptide is considered significantly similar if, when the amino acid sequence of the 
polypeptide is compared with the amino acid sequence of the polypeptide of SEQ ID 
NO: 1 using the BLAST algorithm and the BLOSUM substitution matrix with default 
parameters, the result is a % identity greater than 70 or a % positive greater than 80 
. encompassing at least 75% of the length of SEQ ID NO: 1 , or both. In other 

1 0 embodiments of the invention a polypeptide is considered significantly similar if, 
when the amino acid sequence of the polypeptide is compared with the amino acid 
sequence of the polypeptide of SEQ ID NO: 1 using the BLAST algorithm and the 
BLOSUM substitution matrix with default parameters, the result is a % identity 
greater than 70 or a % positive greater than 80 encompassing at least 90% of the 

1 5 length of SEQ ID NO: 1 , or both. In other embodiments of the invention a polypeptide 
is considered significantly similar i£ when the amino acid sequence of the polypeptide 
is compared with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using 
the BLAST algorithm and the BLOSUM substitution matrix with default parameters, 
the result is a % identity greater than 70 or a % positive greater than 80 encompassing 

20 at least 95% of the length of SEQ ID NO: 1, or both. 

In other embodiments of the invention a polypeptide is considered 
significantly similar i£ when the amino acid sequence of the polypeptide is compared 
with the amino acid sequence of the polypeptide of SEQ ID NO:! using the BLAST 
algorithm and the BLOSUM substitution matrix with default parameters, the result is 

25 a % identity greater than 80 or a % positive greater than 90 encompassing at least 25% 
of the length of SEQ ID NO: 1 , or both. In other embodiments of the invention a 
polypeptide is considered significantly similar if, when the amino acid sequence of the 
polypeptide is compared with the amino acid sequence of the polypeptide of SEQ ID 
NO:l using the BLAST algorithm and the BLOSUM substitution matrix with defeuit 

30 parameters, the result is a % identity greater than 80 or a % positive greater than 90 
encompassing at least 75% of the length of SEQ ID NO:l, or both. In other 
embodiments of the invention a polypeptide is considered significantly similar if, 
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when the amino acid sequence of the polypeptide is compared with the amino acid 
sequence of the polypeptide of SEQ ID NO: 1 using the BLAST algorithm and the 
BLOSUM substitution matrix with default parameters, the result is a % identity 
greater than 80 or a % positive greater than 90 encompassing at least 90% of the 
5 length of SEQ ID NO: 1 , or both. In other embodiments of the invention a polypeptide 
is considered significantly similar if, when the amino acid sequence of the polypeptide 
is compared with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using 
the BLAST algorithm and the BLOSUM substitution matrix with default parameters, 
the result is a % identity greater than 80 or a % positive greater than 90 encompassing 

10 at least 95% of the length of SEQ ID NO: 1, or both. 

In other embodiments of the invention a polypeptide is considered 
significantly similar i£ when the amino acid sequence of the polypeptide is compared 
with the amino acid sequence of the polypeptide of SEQ DO NO:l using the BLAST 
algorithm and the BLOSUM substitution matrix with default parameters, the result is 

15 a % identity greater than 90 or a % positive greater than 95 encompassing at least 25% 
of the length of SEQ ID NO:l, or both. In other embodiments of the invention a 
polypeptide is considered significantly similar i£ when the amino acid sequence of the 
polypeptide is compared with the amino acid sequence of the polypeptide of SEQ ID 
NO:l using the BLAST algorithm and the BLOSUM substitution matrix with default 

20 parameters, the result is a % identity greater than 90 or a % positive greater than 95 
encompassing at least 75% of the length of SEQ ID NO:l, or both. In other 
embodiments of the invention a polypeptide is considered significantly similar if, 
when the amino acid sequence of the polypeptide is compared with the amino acid 
sequence of the polypeptide of SEQ ID NO:l using the BLAST algorithm and the 

25 BLOSUM substitution matrix with default parameters, the result is a % identity 
greater than 90 or a % positive greater than 95 encompassing at least 90% of the 
length of SEQ ID NO: 1, or both. In other embodiments of the invention a polypeptide 
is considered significantly similar if, when the amino acid sequence of the polypeptide 
is compared with the amino acid sequence of the polypeptide of SEQ ID NO:l using 

30 the BLAST algorithm and the BLOSUM substitution matrix with default parameters, 
the result is a % identity greater than 90 or a % positive greater than 95 encompassing 
at least 95% of the length of SEQ ID NO:l 3 or both. By "encompassing at least X% 
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of the length of SEQ ID NO: 1 " (or, in general, SEQ ID NO:Y) is meant that the 
length of the portion of SEQ ID NO:l that is being compared with a potentially 
similar protein is at least X% of the length of SEQ ID NO:l (or, in general, SEQ ID 
NO:Y). 

5 In certain embodiments of the invention a polypeptide having significant 

similarity to the polypeptide of SEQ ID NO:l includes one or more conservative 
amino acid substitutions. Examples of conservative substitutions are well known in 
the art See, for example, Biochemistry, 4th Ed., Stryer, L., et al> W. Freeman and 
Co., 1995 and U.S. Patent No. 6,015,692. The invention also encompasses variants of 

1 0 the polypeptide of SEQ ID NO: 1 and variants of significantly similar polypeptide, 
wherein the .variants have one or more altered or modified amino acids. Alterations 
and modifications may include the replacement of an L- amino acid with a D-amino 
acid, or various modifications including, but not limited to, phosphorylation, 
carboxylation, alkylation, etc. 

1 5 Certain polypeptides having significant similarity to the polypeptide of SEQ 

. IDNO:l contain at least one functional or structural characteristic of BSTP-ecgl. For 
example, certain of the polypeptides contain an epitope that binds an antibody that 
binds to BSTP-ecgl . Certain of the polypeptides have amino acid sequences that 
differ by less than 20, less than 10, or less than 5 amino acids from the amino acid 

20 sequence of SEQ ID NO: 1 . Certain of the polypeptides retain at least one biological 
activity, structural feature, or immunological activity of BSTP-ECGl. 

The invention also encompasses BSTP-ECGl variants. Certain BSTP-ECGl 
variants are at least about 80%, more preferably at least about 90%, and most 
preferably at least about 95% identical in amino acid sequence to a BSTP-ECGl 

25 amino acid sequence, e.g., the amino acid sequence of SEQ ID NO: 1. Certain variant 
amino acid sequences differ by less than 20, less than 10, or less than 5 amino acids 
from the amino acid sequence of SEQ ID NO: 1 . 

The invention also encompasses fragments of BSTP-ECGl. Preferred BSTP- 
ECGl fragments retain at least one biological activity, structural feature, or 

30 immunological activity of BSTP-ECGl . In certain embodiments of the invention the 
fragments are between 5 and 15 amino acids in length. Such fragments are useful, for 
example, as antigens for the generation of antibodies. In certain embodiments of the 
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invention the length of the fragments is at least 50%, at least 75%, at least 90%, at 
least 95%, or at least 99% of the foil length of BSTP-ECG1. 

The invention also includes polynucleotides that encode BSTP-ECG1. In a 
particularly preferred embodiment the invention encompasses a polynucleotide 
5 comprising the polynucleotide sequence of SEQ ID NO: 2. In another preferred 
embodiment the invention encompasses a polynucleotide comprising the 
polynucleotide sequence of SEQ ID NO: 3. In another preferred embodiment the 
invention encompasses a polynucleotide comprising the polynucleotide sequence of 
SEQ ID NO: 4. The invention further includes polynucleotides that encode the 

10 inventive polypeptide variants described above. 

The invention also encompasses a variant of a polynucleotide sequence 
encoding BSTP-ECG1 . Certain variants have at least about 80%, more preferably at 
least about 90%, and most preferably at least about 95% sequence identity to a 
polynucleotide sequence encoding BSTP-ECG1. Certain embodiments of the 

15 invention include a variant of the polynucleotide sequence of SEQ ID NO: 2 which 
has at least about 80%, at least about 90%, or at least about 95% sequence identity to 
the polynucleotide sequence of SEQ ID NO: 2. In another embodiment the invention 
includes a variant of the polynucleotide sequence of SEQ ID NO: 3 which has at least 
about 80%, at least about 90%, or at least about 95% sequence identity to the 

20 polynucleotide sequence of SEQ ID NO: 3. In another embodiment the invention 
includes a variant of the polynucleotide sequence of SEQ ID NO:4 which has at least 
about 80%> at least about 90%, or at least about 95% sequence identity to the 
polynucleotide sequence of SEQ ID NO:4, Certain variant polynucleotide sequences 
differ by less than 20, less than 10, or less than 5 nucleotides from the original 

25 sequence. The invention further includes polynucleotides that encode the inventive 
polypeptide variants and fragments described above. Certain polynucleotide variants 
and fragments encode an amino acid sequence that contains at least one functional or 
structural characteristic of BSTP-ECG1. Certain polynucleotide fragments comprise 
at least about 50%, at least about 75%, at least about 80%,at least about 90%, or at 

30 least about 95% of the polynucleotide sequence of SEQ ID NO: 2, the polynucleotide 
sequence of SEQ ID NO; 3, or the polynucleotide sequence of SEQ ID NO: 4. 
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As is well known in the art, due to the degeneracy of the genetic code (i.e., the 
fact that in many cases multiple different codons can code for the same amino acid), 
multiple different polynucleotide sequences encode BSTP-ECG1 of the present 
invention. The invention encompasses all of the sequences that can be made by 
5 substituting alternative codons in accordance with the genetic code. It is noted that 
such substitution must be made appropriately in view of the reading frame of the 
polynucleotide. It is further noted that many of these polynucleotide sequences will 
display little or no homology with the sequences in SEQ ID NO: 2, SEQ ID NO: 3, or 
SEQ ID NO: 4. In certain embodiments of the invention it may be preferred to 

1 0 employ polynucleotides encoding BSTP-ECG1 having a significantly different codon 
usage to that found in naturally occurring BSTP-ECG1 . For example, if the inventive 
polynucleotides are to be used to express BSTP-ECG1 in a heterologous system such 
as a bacterial or yeast expression system, it may be desirable to employ 
polynucleotides having a codon usage preferred for optimal expression in the 

15 heterologous system. Such codon usage preferences are well known in the art 
Altering the nucleotide sequence encoding BSTP-ECG1 may have additional uses 
such as maximizing RNA stability, as it is well known in the art that KNA stability 
can be affected by the sequence of the RNA. 

The invention also includes polynucleotides having a complementary 

20 nucleotide sequence to any of the inventive polynucleotides described above. Such 
complementary polynucleotides are useful as probes, e.g., to detect expression of the 
inventive polynucleotides at die RNA level. Such complementary polynucleotides are 
also useful as antisense reagents, to inhibit the expression of the corresponding genes 
at the protein level, e.g., by interfering with mRNA translation. Inhibiting gene 

25 expression has a variety of applications, e.g., it may be used to gain information about 
the function of the encoded protein. In addition, antisense inhibition of gene 
expression may be used therapeutically. 

The invention encompasses polynucleotides that are able to hybridize to any of 
the inventive polynucleotide sequences discussed above under various conditions of 

30 stringency. In general, a hybridizing polynucleotide will have a sequence either 
partially or fully complementary with the polynucleotide to which it hybridizes. 
Hybridization conditions of various stringency are well known in the art and are, in 
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general, governed by the concentration of reagents such as salts and formamide in the 
hybridization buffer as well as by the temperature at which hybridization is 
performed. For example, a stringent hybridization can be performed by use of a 
hybridization buffer comprising 30% formamide in 0.9M saline/0 .09M sodium citrate 
5 (SSC) buffer at a temperature of 45° C followed by washing twice with that SSC 
buffer at 45° C. A moderately stringent hybridization condition could include use of a 
hybridization buffer comprising 20% formamide in 0.8M saline/0.08M SSC buffer at 
a temperature of 37° C. followed by washing once with that SSC buffer at 37° C. 
Further examples of stringent conditions are found in U.S. Patent No. 6,008,337; in 

10 Maniatis, T., Sambrook, J. and Fritsch, E., Molecular Cloning: A Laboratory Manual 
(3 Volume Set); and in numerous other sources known to one of ordinary skill in the 
art. Appropriate stringency conditions to achieve particular degrees of hybridization 
specificity when using cDNA or oligonucleotide arrays are also well known. The 
selection of appropriate hybridization conditions will typically be determined by the 

15 purpose for which hybridization is to be carried out and is a matter of choice for the 
practitioner. 

The invention further includes oligonucleotides comprising a fragment of a 
polynucleotide encoding BSTP-ECG1 or comprising a fragment of a polynucleotide 
complementary to such a polynucleotide. Preferred oligonucleotides are between 6 

20 nucleotides and 60 nucleotides in length, preferably approximately 1 5 to 30 

nucleotides in length, and more preferably between about 20 and 25 nucleotides in 
length. Such oligonucleotides are useful, for example, as primers in PCR 
amplification or in hybridization assays including microarray assays. 

The invention contemplates the production of any of the polynucleotides or 

25 fragments thereof described herein by chemical synthesis, by PCR, or by use of 
expression vectors including the polynucleotides. Such expression vectors and 
methods of their use are well known in the art. The inventive polynucleotides and 
fragments thereof can be produced using an in vitro transcription system or within a 
host cell containing a vector comprising the inventive polynucleotide sequence and 

30 appropriate genetic control elements (e.g., enhancers, promoters, terminators) 
operatively linked to the polynucleotide sequence so as to direct transcription 
therefrom. In vitro transcription systems are well known in the art as are vectors 
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containing appropriate genetic control elements for directing transcription of an 
inserted polynucleotide sequence and host cells in which such vectors are maintained. 
The invention encompasses the production of either DNA or RNA having the 
sequence of an inventive polynucleotide. One of ordinary skill in the art will be able 
5 to select appropriate vectors and synthesis conditions depending upon whether it is 
desired to produce DNA or RNA. It is noted that the inventive polynucleotides and 
fragments thereof can also be synthesized entirely through chemical means. 
Techniques and machines for chemical synthesis of polynucleotides are well known in 
the art. The polynucleotides can be labeled or conjugated with detectable moieties 

10 including radionuclides, enzymes, chromogenic substrates, fluorescent substances, 
etc., using any of a variety of techniques. 

Polynucleotides encoding BSTP-ECG1 can be extended, e.g., to identify 
upstream elements such as promoters or other regulatory elements, using techniques 
that are well known in the art. Such techniques are described, for example, in U.S. 

15 Patent No. 6,008, 337, which is herein incorporated by reference, and include a 
variety of PCR-based methods, screening of cDNA libraries, primer extension, etc. 
Genomic sequence such as introns can also be obtained. Discovery of additional 
sequence may also be performed using computer-based searches of sequenced human 
DNA. As is well known, large portions of the human genome have been sequenced, 

20 but relatively little information exists as to the structure and organization of much of 
the sequence. Thus extension of the inventive polynucleotide sequences may be 
performed by careful examination of previously sequenced genomic DNA. Preferably 
any predictions based on examination of genomic sequence are verified 
experimentally, since it is well known in the art that a significant number of errors 

25 exist in the genomic sequence and in the predictions (e.g., predictions of genes and 
open reading frames) based thereon. 

It is well known in the art that different cell types, cells at different stages of 
differentiation, and/or cells within organisms at different developmental stages may 
express variants of the same gene, e.g., variants derived by alternative splicing. 

30 Therefore, multiple different mRNAs corresponding to the same gene may exist. 

Such mRNAs are transcribed from the same region of genomic DNA but may vary in 
sequence, usually lacking or containing domains corresponding to introns or regions 
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at the 5' and/or 3 1 end of the message. The present invention encompasses such 
variant polynucleotides and their encoded polypeptides. Variant polynucleotides can 
be identified and cloned by screening cDNA libraries produced using mRNA from 
cells of various different types, differentiation states, or from tissues at different 
5 developmental stages. 

The invention contemplates production of any of the BSTP-ECG1 
polypeptides or fragments thereof using any of a variety of techniques including both 
in vivo or in vitro synthesis. For example, polynucleotides encoding an inventive 
polypeptide can be inserted into an expression vector, which can then be introduced 

10 into an appropriate host cell (e.g., a bacterial, yeast, insect, or mammalian cell). Thus 
the invention includes an expression vector comprising a polynucleotide comprising a 
nucleotide sequence set forth in SEQ ID NO: 2, SEQ ID NO: 3, or SEQ ID NO: 4 or 
comprising a variant of a nucleotide sequence set forth in SEQ ID NO: 2, SEQ ID 
NO: 3, or SEQ ID NO: 4. The invention further includes a host cell comprising any 

15 of the afore-mentioned vectors. A wide variety of vector/host expression systems are 
known in the art In general, vectors contain necessary control and regulatory 
sequences (e.g., enhancers, promotors, polyadenylation sequence, etc.) operatively 
linked to an inserted polynucleotide so as to direct expression of the polypeptide in 
the appropriate host cell. Depending upon the host cell to be employed, appropriate 

20 vectors may include phages, viruses, or plasmids. The invention encompasses any 
available vector/host expression system and specifically includes vectors that direct 
expression of an inventive polynucleotide, vectors that direct expression of an 
inventive polypeptide (e.g., expression vectors), in addition to cells and cell lines 
transformed with such vectors. In certain embodiments of the invention the inventive 

25 polypeptide is secreted from a cell transformed with an expression vector, thereby 
allowing purification from the medium rather than by harvesting the cells. The 
invention also encompasses the production of inventive polypeptides in cells that have 
been engineered to express such polypeptides according to the methods described in 
U.S. Patent No. 6,063,630, which discloses methods of 'Hunting on" an endogenous 

30 gene in cells that normally express the gene at low or undetectable levels. Methods 
for harvesting, isolating, purifying, etc., polypeptides from cells expressing such 
polypeptides are well known in the art. 
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BSTP-ECG1 and BSTP-ECG1 variants and fragments thereof can also be 
produced in animals or plants that are transgenic for a polynucleotide sequence 
encoding the polypeptide. The invention includes such animals and plants. In 
addition to their potential use as sources for the inventive polypeptides, transgenic 
5 animals may be used to study the function of the inventive polypeptides. Methods 
for the production of transgenic animals and plants, as well as methods for purifying 
and harvesting inventive polypeptides from such animals and plants are well known in 
the art and are within the scope of the invention. The invention also encompasses 
cells and transgenic animals that have been engineered to lack expression of the 

10 inventive polynucleotides. Methods for "knocking ouf * a gene using the technique of 
homologous recombination and methods of creating cells and organisms lacking 
expression of the knocked out gene are well known in the art and are described, for 
example, in U.S; Patent No. 5,464,764, U.S. Patent No. 5,487,992, U.S. Patent No. 
5,627,059, and U.S. Patent No. 5,631,153. 

15 As will be appreciated by one of ordinary skill in the art, in certain 

circumstances it may be advantageous to modify an inventive polynucleotide 
sequence by ligating it to a heterologous sequence, thereby enabling the production of 
a fusion protein. Certain vectors are designed to incorporate such heterologous 
sequences so that insertion of a polynucleotide into the vector at an appropriate 

20 location results in an in-frame fusion to the heterologous sequence, which may be 
either upstream or downstream from the inserted polynucleotide. Such heterologous 
sequences may encode tags or cleavable linker sequences such as glutathione S- 
transferase (GST), the hemaglutinin epitope known as HA tag, a short stretch of the 
Myc protein (Myc tag), FLAG tag, 6X His tag, maltose binding protein tag, etc. In 

25 general, many of these tags are useful for the purification and/or detection of the 
polypeptide using an antibody or other reagent that binds to the tag. Other useful 
heterologous sequences include that of green fluorescent protein (GFP), which allows 
visualization of the fusion protein. The present invention encompasses all such 
BSTP-ECG1 fusion proteins. 

30 The invention provides an antibody that binds to BSTP-ECG1 or to a fragment 

thereof. Antibodies to these polypeptides may be generated, for example, as described 
in Example 6 below. In general, such antibodies may be generated by methods well 
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known in the art and described, for example, in Harlow, E., and Lane, D., Antibodies: 
A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
1988. Details and references for the production of antibodies based on an inventive 
polypeptide may also be found in US. Patent No. 6,008,337. Antibodies may 
5 include, but are not limited to, polyclonal, monoclonal, chimeric (e.g., "humanized"), 
and single chain antibodies, and Fab fragments. The invention encompasses "folly 
human" antibodies produced using the XenoMouse™ technology (AbGenix Corp., 
Fremont, CA) according to the techniques described in U.S. Patent No. 6,075,181. 

10 B. Diagnostics and methods of use thereof 

The invention provides reagents and methods for detecting the inventive 
polynucleotides and polypeptides described above. The reagents are useful for 
diagnostic purposes among others. In one aspect, the invention provides a method of 
classifying tumors by detecting the presence of one or more of the inventive 

15 polypeptides or polynucleotides, i.e., polypeptides or polynucleotides comprising a 
sequence set forth in SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO:3, SEQ ID NO:4, 
SEQ ID NO: 5, or fragments thereof. As is well known in the art, a polypeptide may 
be detected using a variety of techniques that employ an antibody that binds to the 
polypeptide. In addition, in certain embodiments of the invention the polypeptide is 

20 detected using other modalities directed at the detection of the polypeptide, such as 
aptamers (Aptamers, Molecular Diagnosis, Vol. 4, No. 4, 1999). In general, any 
appropriate method for detecting a polypeptide may be used in conjunction with the 
present invention, although antibodies represent a preferred modality. 

Thus the invention provides a method for classifying a disease comprising the 

25 steps of: (a) obtaining cells or tissue from a site of disease; (b) detecting a polypeptide 
having a sequence selected from the group consisting of SEQ ID NO: 1, or variants 
thereof; and (c) placing the disease into one of a set of predetermined categories based 
on detection of the polypeptide. The predetermined categories are, in general, 
characterized by differences in their expression of BSTP-ECG1 and also by 

30 differences in some aspect of tumor phenotype. For example, and without intending 
to be limiting, one predetermined category may be characterized by a good prognosis; 
one predetermined category may be characterized by a poor prognosis; and one 
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predetermined category may be characterized by a non-responder phenotype. In 
addition to differences in phenotype, the predetermined categories also feature 
differences in the expression of BSTP-ECG1, thus allowing the placement of a tumor 
into one of the categories based on the detection (which may include measurement) of 
5 BSTP-ECG1 in a sample from the tumor. Thus the assignment may be used as a basis 
for providing diagnostic, prognostic, and/or predictive information to a patient having 
the tumor. 

The invention encompasses a number of uses for antibodies that bind to 
BSTP-ECG1. In a broad aspect, antibodies that bind to BSTP-ECG1 can be used to 

1 0 provide information useful for diagnosis, prognosis, classification, or monitoring of a 
disorder characterized by the inappropriate expression of the polypeptide. The term 
"inappropriate expression", as used herein, can include, but is not limited to, 
underexpression, overexpression, or expression in a cell type or organ in which the 
polypeptide is not normally expressed. Inappropriate expression can include (1) 

15 underexpression relative to normal for that cell or tissue, (2) overexpression relative to 
normal for that cell or tissue, or (3) mislocalization within a cell or tissue relative to 
normal for that cell or tissue. In order to provide a basis for diagnosis, prognosis, 
classification, or monitoring of a disorder characterized by the inappropriate 
expression of the BS1P-ECG1 polypeptide, a normal or standard expression profile 

20 can first be established by measuring the expression of BSTP-ECG1 in cells, tissues, 
body fluids, etc., obtained from subjects not suffering from the disorder. In general, a 
range of values may be considered normal, and departure from within this range of 
values may be taken to indicate that an individual suffers from or is at increased 
likelihood to develop the disorder. Of course in certain cases the information will 

25 simply consist of an indication of the presence or absence of BSTP-ECG1 in a cell or 
tissue sample (e.g., a sample of breast cancer cells or tissue), regardless of whether 
BSTP-ECG1 expression is considered "inappropriate". 

As used herein the term "diagnostic information" includes, but is not limited 
to, any type of information that is useful in determining whether a patient has, or is at 

30 increased risk for developing, a disease or disorder; for providing a prognosis for a 
patient having a disease or disorder; for classifying a disease or disorder; for 
monitoring a patient for recurrence of a disease or disorder; for selecting a prefeited 
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therapy; for predicting the likelihood of response to a therapy, etc. In a preferred 
embodiment of the invention, antibodies to BSTP-ECG1 are used for providing 
diagnostic information for cancer, particularly for breast cancer. In general, 
diagnostic assays in which the antibodies may be employed include methods that use 
5 the antibody to detect BSTP-ECG1 in a tissue sample, cell sample, body fluid sample 
(e.g., serum), cell extract, etc. Thus the invention provides a method for detecting a 
polypeptide comprising an amino acid sequence set forth in SEQ ID NO: 1 in a 
biological sample comprising steps of: (a) contacting the biological sample with an 
antibody that binds to the polypeptide of SEQ ID NO: 1 ; and (b) determining whether 

1 0 the antibody specifically binds to the sample, the binding being an indication that the 
sample contains the polypeptide. The biological sample can be processed in any of a 
variety of ways prior to being placed in contact with the antibody. 

Many detection methods typically involve the use of a labeled secondary 
antibody that recognizes the primary antibody (i.e., the antibody that binds to the 

15 polypeptide being detected). Depending upon the nature of the sample, appropriate 
methods include, but are not limited to, immunohistochemistry, radioimmunoassay, 
ELISA, immunoblotting, and FACS analysis. In the case where the polypeptide is to 
be detected in a tissue sample, e.g., a biopsy sample, immunohistochemistry is a 
particularly appropriate detection method. Techniques for obtaining tissue and cell 

20 samples and performing immunohistochemistry and FACS are well known in the art. 
Such techniques are routinely used, for example, to detect the ER in breast tumor 
tissue or cell samples. In general, such tests will include a negative control, which can 
involve applying the test to normal tissue so that the signal obtained thereby can be 
compared with the signal obtained from the sample being tested. In tests in which a 

25 secondary antibody is used to detect the antibody that binds to the polypeptide of 
interest, an appropriate negative control can involve performing the test on a portion 
of the sample with the omission of the antibody that binds to the polypeptide to be 
detected, i.e., with the omission of the primary antibody. Antibodies suitable for use 
as diagnostics generally exhibit high specificity for the target polypeptide and low 

30 background. In general, monoclonal antibodies are preferred for diagnostic purposes. 
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In general, the results of such a test can be presented in any of a variety of 
formats. The results can be presented in a qualitative fashion. For example, the test 
report may indicate only whether or not a particular polypeptide was detected, perhaps 
also with an indication of the limits of detection. The results may be presented in a 
5 semi-quantitative fashion. For example, various ranges may be defined, and the 
ranges may be assigned a score (e.g., 1+ to 4+) that provides a certain degree of 
quantitative information. Such a score may reflect various factors, e.g., the number of 
cells in which the polypeptide is detected, the intensity of the signal (which may ; 
indicate the level of expression of the polypeptide), etc. The results may be presented 

10 in a quantitative fashion, e.g., as a percentage of cells in which the polypeptide is 
detected, as a protein concentration, etc. As will be appreciated by one of ordinary . 
skill in the art, the type of output provided by a test will vary depending upon the 
technical limitations of the test and the biological significance associated with 
detection of the polypeptide. For example, in the case of certain polypeptides a purely 

1 5 qualitative output (e.g., whether or not the polypeptide is detected at a certain 

detection level) provides significant information. In other cases a more quantitative 
output (e.g., a ratio of the level of expression of the polypeptide in the sample being 
tested versus the normal level) is necessary. 

A particular use for antibodies that bind to BSTP-ECG1 is to classify breast 

20 tumors based on the association between the expression level of the BST-ECG1 gene, 
or the association between a gene subset that includes the BST-ECG1 gene, and a 
tumor subset having a particular phenotype (e.g., a good prognosis phenotype, a poor 
prognosis phenotype, a non-responder phenotype, etc.). Immunohistochemistry using 
an antibody that binds to BSTP-ECG1 can be employed to detect BSTP-ECG1 in a 

25 tumor tissue or cell sample, thereby providing information useful in determining 
whether the tumor expresses or overexpresses the polypeptide. Additional detection 
methods including MA, ELISA, immunoblotting, and FACS analysis. Thus the 
present invention provides a test for classifying tumors. 

The result of such a test (e.g., whether a given tumor expresses or 

30 overexpresses BSTP-ECG1, quantitative expression level of BSTP-ECG1, etc.) can be 
used to provide information about the prognosis of the tumor or the likelihood that the 
tumor will respond to therapy. In certain embodiments of the inventive methods a 
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single antibody is used whereas in other embodiments of the invention multiple 
antibodies, directed either against the same or against different polypeptides can be 
used to increase the sensitivity or specificity of the test or to provide more detailed 
information than that provided by a single antibody. Thus the invention encompasses 
5 the use of a battery of antibodies, one or more of which binds to BSTP-ECGl . 

Although in many cases detection of polypeptides using antibodies represents 
the most convenient means of determining whether BST-ECG1 is expressed (or 
overexpressed) in a particular sample, the invention also encompasses the use of 
polynucleotides for this purpose. The invention provides methods for detecting a 

1 0 polynucleotide encoding a polypeptide comprising an amino acid sequence set forth in 
SEQ ID NO: 1 , or a fragment thereof in a biological sample. One such method 
comprises steps of. (a) hybridizing a nucleic acid complementary to the 
polynucleotide encoding a polypeptide comprising an amino acid sequence set forth in 
SEQ ID NO:l, or a fragment thereof, to at least one nucleic acid in the biological 

15 sample, thereby forming a hybridization complex; and (b) detecting the hybridization 
complex, wherein the presence of the hybridization complex indicates the presence of 
a polynucleotide encoding the polypeptide in the biological sample. A second such 
method comprises steps of: 

(a) hybridizing a nucleic acid encoding a polypeptide comprising an amino acid 

20 sequence set forth in SEQ ID NO: 1 , or a fragment thereof to at least one nucleic acid 
complementary to a nucleic acid in the biological sample, thereby forming a 
hybridization complex; and 

(b) detecting the hybridization complex, wherein the presence of the hybridization 
complex indicates the presence of a polynucleotide encoding the polypeptide in the 

25 biological sample. In other words, detection can comprise detecting either a 

polynucleotide encoding a polypeptide comprising an amino acid sequence set forth in 
SEQ ID NO: 1, or detecting the complement of such a polynucleotide. For example, 
in certain embodiments of the inventive method mRNA in the biological sample is 
detected while in other embodiments of the invention cDNA synthesized from mRNA 
' 30 in the biological sample is detected. 

The hybridization complex can be detected in any of a variety of ways. For 
example, the hybridization complex may be formed on a microarray (e.g., a cDNA 
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array or an oligonucleotide array) and detected using techniques such as those 
described herein or related techniques well known in the art. Microarray analysis is 
but one means by which polynucleotides can be used to detect or measure BST-ECG1 
expression. Expression of BST-ECG1 can also be measured by a variety of other 
5 techniques that make use of a polynucleotide corresponding to part or all of the BST- 
ECG1 gene rather than an antibody that binds to a polypeptide encoded by the gene. 
Appropriate techniques include, but are not limited to, in situ hybridization, Northern 
blot, and various nucleic acid amplification techniques such as PCR, quantitative 
PCR, and the ligase chain reaction. 

10 Another aspect of the invention comprises a kit to test for the presence of any 

of the inventive polynucleotides or polypeptides, e.g., in a tissue sample or in a body 
fluid. The kit can comprise, for example, an antibody for detection of a polypeptide 
(e.g., the polypeptide of SEQ ID NO:l) or a probe for detection of a polynucleotide. 
In addition, the kit can comprise a reference sample, instructions for processing 

15 samples, performing the test and interpreting the results, buffers and other reagents 
necessary for performing the test. In certain embodiments the kit can comprise a 
panel of antibodies. In certain embodiments the kit comprises a cDNA or 
oligonucleotide array for detection of expression of the gene encoding BSTP-ECG1, 
e.g., for detecting the presence of a polynucleotide comprising the nucleotide 

20 sequence set forth in SEQ ID NO:2, SEQ ID NO:3, or SEQ ID NO: 4. 

Yet another aspect of the invention comprises detecting mutations in the gene 
encoding BSTP-ECG1 and/or in regulatory regions of the gene. The invention further 
encompasses detecting allelic variants of the gene encoding BSTP-ECG1. As 
mentioned above, mutations in certain genes (e.g., BRCA-1, BRCA-2) have been 

25 associated with an increased risk of breast cancer. The detection of mutations and 
allelic variants can be performed using any of a variety of methods well known in the 
art ranging from use of microarrays (e.g., oligonucleotide arrays) to detect single 
nucleotide polymorphisms (SNPs) associated with a particular allele, use of 
microarrays (e.g.,oligonucleotide arrays) to detect substitutions, deletions, etc., 

30 detection of restriction fragment length polymorphisms (RFLPs), direct sequencing 
of DNA isolated from an individual, etc. 
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C, Therapeutics 

The invention includes the use of the polynucleotides, polypeptides, and 
antibodies described herein as therapeutic agents for the treatment of cancer. In 
particular, the invention contemplates the use of the polynucleotides, polypeptides, 
5 and antibodies described herein for the treatment of breast cancer, although their use 
for the treatment of other forms of cancer is also within the scope of the invention. 
The invention specifically encompasses antagonists to BSTP-ECGL Such 
antagonists (which include, but are not limited to, antibodies, small molecules, 
antisense nucleic acids) may be produced or identified using any of a variety of 

10 methods known in the art. For example, a purified inventive BSTP-ECG1 

polypeptide or fragment thereof may be used to raise antibodies or to screen libraries 
of compounds to identify those that specifically bind to the polypeptide. While not 
wishing to be bound by any theory, the presence of a putative transmembrane domain 
suggesting that BSTP-ECG1 is likely to include an extracellular region may make it a 

1 5 particularly preferred target for therapeutics. 

Preferably antibodies suitable for use as therapeutics exhibit high specificity 
for the target polypeptide and low background binding to other polypeptides. In 
general, monoclonal antibodies are preferred for therapeutic purposes. In the case of 
breast cancer, antibodies against the HER2/neu/ErbB2 polypeptide (a polypeptide 

20 homologous to the epidermal growth factor receptor) represent a paradigm in terms of 
the development of therapeutic antibodies. The HER2/neu/ErbB2 gene is 
overexpressed in approximately 25 to 30 percent of metastatic breast tumors, and an 
antibody against the HER2/neu/ErbB2 polypeptide, Herceptin® (Trastuzumab) is 
approved for the treatment of certain patients with metastatic breast cancer, 

25 confirming the utility of therapeutic antibodies directed against polypeptides that are 
specifically overexpressed in particular tumors subsets. Antibodies directed against a 
polypeptide expressed by a cell may have a number of mechanisms of action. In 
certain instances, e.g., in the case of a polypeptide that exerts a growth stimulatory 
effect on a cell, antibodies may directly antagonize the effect of the polypeptide and 

30 thereby arrest tumor progression, trigger apoptosis, etc. While not wishing to be 
bound by any theoiy, it may be particularly likely that certain genes that are 
overexpressed in tumors having a poor prognosis encode polypeptides that have a 
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growth stimulatory effect on tumor cells or facilitate the growth of such cells in some 
other way, e.g., by enhancing angiogenesis, by allowing cells to overcome normal 
growth regulatory mechanisms, or by blocking mechanisms that would normally lead 
to elimination of mutated or otherwise abnormal cells. In certain embodiments of the 
5 invention the antibody may serve to target a toxic moiety to the cell. Thus the 
invention encompasses the use of antibodies that have been conjugated with a 
cytotoxic agent, e.g., a toxin such as ricin or diphtheria toxin, a radioactive moiety, 
etc. Such antibodies can be used to direct the cytotoxic agent specifically to cells that 
express the inventive polypeptide. 

10 Although certain antagonists may function through direct interaction with a 

polypeptide such as BSTP-ECG1, e.g., by inhibiting its activity, others may function 
by affecting expression of the polypeptide. Reduction in expression of an 
endogenously produced polypeptide may be achieved by the administration of 
antisense nucleic acids (e.g., oligonucleotides, KNA, DNA, most typically 

1 5 oligonucleotides that have been modified to improve stability or targeting) or peptide 
nucleic acids comprising sequences complementary to those of the mRNA that 
encodes the polypeptide. Antisense technology and its applications are described in 
Phillips, MX (ed.) Antisense Technology, Methods Enzymol., Volumes 313 and 314, 
Academic Press, San Diego, 2000, and references mentioned therein. Ribozymes 

20 (catalytic RNA molecules that are capable of cleaving other RNA molecules) 
represent another approach to reducing gene expression. Such ribozymes can be 
designed to cleave specific mRNAs corresponding to a gene of interest. Their use is 
described in U.S. Patent No. 5,972,621, and references therein. The invention 
encompasses the delivery of antisense and/or ribozyme molecules via a gene therapy 

25 approach in which vectors or cells expressing the antisense molecules are 
administered to an individual. 

It may also be desirable to increase the expression of a gene encoding BSTP- 
ECG1 or to increase the activity of BSTP-ECG1. For example, in the case of genes 
(such as that encoding BSTP-ECG1), whose expression correlates with that of the 

30 estrogen receptor and therefore is associated with a good prognosis, it may be 

desirable to increase the expression of such genes or the activity of the corresponding 
polypeptides in tumors that fail to express these genes. 
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Small molecule modulators (e.g., inhibitors or activators) of BST-ECG1 gene 
expression are also within the scope of the invention and may be detected by 
screening libraries of compounds using, for example, cell lines that express the 
polypeptide or a version of the polypeptide that has been modified to include a readily 
5 detectable moiety. Methods for identifying compounds capable of modulating gene 
expression are described, for example, in U.S. Patent No. 5,976,793. The screening 
methods described therein are particularly appropriate for identifying compounds that 
do not naturally occur within cells and that modulate the expression of genes of 
interest whose expression is associated with a defined physiological or pathological 

1 0 effect within a multicellular organism. 

More generally, the invention encompasses compounds that modulate the 
activity of a polypeptide encoding BSTP-ECG1 . Methods of screening for such 
interacting compounds are well known in the art and depend, to a certain degree, on 
the particular properties and activities of the polypeptide encoded by the gene. 

15 Representative examples of such screening methods may be found, for example, in 
U.S. Patent No. 5,985,829, U.S. Patent No. 5,726,025, U.S. Patent No. 5,972,621, and 
U.S. Patent No. 6,01 5,692. The skilled practitioner will readily be able to modify and 
adapt these methods as appropriate for BSTP-ECG1 . The mechanism of modulation 
need not be direct. For example, the modulator may act on an enzyme that may 

20 modify BSTP-ECG1. 

The invention also encompasses the use of polynucleotides encoding BSTP- 
ECG1 , or portions thereof, as DNA vaccines. Such vaccines comprise polynucleotide 
sequences, typically inserted into vectors, that direct the expression of an antigenic 
polypeptide within the body of the individual being immunized. Details regarding the 

25 development of vaccines, including DNA vaccines, for various forms of cancer may 
be found, for example, in Brinckerhoff L.H., Thompson L.W., Slingluff C.L., Jr., 
Melanoma Vaccines, Curr Opin Oncol, 12(2): 163-73, 2000 and in Stevenson, F.K., 
DNA vaccines against cancer: from genes to therapy, Ann. Oncol, 10(12): 1413-8, 
1999 and references therein. BSTP-ECG1 polypeptides, or fragments thereof, that 

30 may also find use as cancer vaccines. Any of these vaccines may be used for the 
prevention and/or the treatment of cancer. 
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The invention includes pharmaceutical compositions comprising the inventive 
polypeptides, polynucleotides, antibodies, small molecule inhibitors, agonists, or 
antagonists described above. In general, a pharmaceutical composition will include 
an active agent in addition to one or more inactive agents such as a sterile, 
5 biocompatible carrier including, but not limited to, sterile water, saline, buffered 
saline, or dextrose solution. The pharmaceutical compositions may be administered 
either alone or in combination with other therapeutic agents including other 
chemotherapeutic agents, hormones, vaccines, and/or radiation therapy. By "in 
combination with", it is not intended to imply that the agents must be administered at 

10 the same time or formulated for delivery together, although these methods of delivery 
are within the scope of the invention. In general, each agent will be administered at a 
dose and on a time schedule determined for that agent Additionally, the invention 
encompasses the delivery of the inventive pharmaceutical compositions in 
combination with agents that may improve their bioavailability, reduce or modify 

1 5 their metabolism, inhibit their excretion, or modify their distribution within the body. 
The invention encompasses treating cancer, particularly breast cancer, by 
administering the pharmaceutical compositions of the invention. Although the 
pharmaceutical compositions of the present invention can be used for treatment of any 
subject (e.g., any animal) in need thereof, they are most preferably used in the 

20 treatment of humans. 

The pharmaceutical compositions of this invention can be administered to 
humans and other animals by a variety of routes including oral, intravenous, 
intramuscular, intraarterial, subcutaneous, intraventricular, transdermal, rectal 
intravaginal, intraperitoneal, topical (as by powders, ointments, or drops), bucal, or as 

25 an oral or nasal spray or aerosol. In general the most appropriate route of 

administration will depend upon a variety of factors including the nature of the 
compound (e.g., its stability in the environment of the gastrointestinal tract), the 
condition of the patient (e.g., whether the patient is able to tolerate oral 
administration), etc. At present the intravenous route is most commonly used to 

30 deliver therapeutic antibodies and nucleic acids. However, the invention encompasses 
the delivery of the inventive pharmaceutical composition by any appropriate route 
taking into consideration likely advances in the sciences of drug delivery. 
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General considerations in the formulation and manufacture of pharmaceutical 
agents may be found, for example, in Remington 's Pliarmaceutical Sciences, 19 th ed., 
Mack Publishing Co., Easton, PA, 1 995. It will be appreciated that certain of the 
compounds of the present invention can exist in free form for treatment, or, where 
5 appropriate, in salt form, as discussed in more detail below. Compounds to be 
utilized in the pharmaceutical compositions include compounds existing in free form 
or pharmaceutically acceptable derivatives thereof, as defined herein, such as 
pharmaceutically acceptable salts, esters, salts of such esters, or any other adduct or 
derivative, which upon administration to a patient in need, is capable of providing, 

10 directly or indirectly, a compound as otherwise described herein, or a metabolite or 
residue thereof, e.g., a prodrug. Thus, as used herein, the term "pharmaceutically 
acceptable salt" refers to those salts which are, within the scope of sound medical 
judgment, suitable for use in contact with the tissues of humans and lower animals 
without undue toxicity, irritation, allergic response and the like, and are 

15 commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts 
are well known in the art. For example, S. M. Berge, et al. describe pharmaceutically 
acceptable salts in detail in J. Pharmaceutical Sciences, 66: 1-19 (1977), incorporated 
herein by reference. The salts can be prepared in situ during the final isolation and 
purification of the compounds of the invention,* or separately by reacting the free base 

20 function with a suitable organic acid. Examples of pharmaceutically acceptable, 
nontoxic acid addition salts are salts of an amino group formed with inorganic acids 
such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid and ■ 
perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, 
tartaric acid, citric acid, succinic acid, or malonic acid or by using other methods used 

25 in the art such as ion exchange. Other pharmaceutically acceptable salts include 
adipate, alginate, ascorbate, aspartate, benzenesulfonate, benzoate, bisulfate, borate, 
butyrate, camphorate, camphorsulfonate, citrate, cyclopentanepropionate, digluconate, 
dodecylsulfate, ethanesulfonate, formate, fiimarate, glucoheptonate, glycerophosphate, 
gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2-hydroxy- 

30 ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, 
malonate, methanesulfonate, 2-naphthalenesulfbnate, nicotinate, nitrate, oleate, 
oxalate, pahnitate, pamoate, pectinate, persulfate, 3-phenylpropionate, phosphate, 
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picrate, pivalate, propionate, stearate, succinate, sulfate, tartrate, thiocyanate, p- 
toluenesulfonate, undecanoate, valerate salts, and the like. Representative alkali or 
alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, 
and the like. Further pharmaceutically acceptable salts include, when appropriate, 
5 nontoxic ammonium, quaternary ammonium, and amine cations formed using 

counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, lower 
alkyl sulfonate and aryl sulfonate. 

Additionally, as used herein, the term "pharmaceutically acceptable ester" 
refers to esters that hydrolyze in vivo and include those that break down readily in the 

1 0 human body to leave the parent compound or a salt thereof. Suitable ester groups 
include, for example, those derived from pharmaceutically acceptable aliphatic . 
carboxylic acids, particularly alkanoic, alkenoic, cycloalkanoic and alkanedioic acids, 
in which each alkyl or alkenyl moiety advantageously has not more than 6 carbon 
atoms. Examples of particular suitable esters includes formates, acetates, propionates, 

1 5 butyrates, aery lates and ethylsuccinates. 

Furthermore, the term "pharmaceutically acceptable prodrugs" as used herein 
refers to those prodrugs of the compounds of the present invention that are, within the 
scope of sound medical judgment, suitable for use in contact with the tissues of 
humans and lower animals without undue toxicity, irritation, allergic response, and 

20 the like, commensurate with a reasonable benefit/risk ratio, and effective for their 
intended use, as well as the zwitterionic forms, where possible, of the compounds of 
the invention. The term "prodrug 11 refers to compounds that are rapidly transformed in 
vivo to yield a particular active compound, for example by hydrolysis in blood. A 
thorough discussion is provided in T. Higuchi and V, Stella, "Pro-drugs as Novel 

25 Delivery Systems", Vol, 14 of the A.C.S. Symposium Series, and in Edward B. 
Roche, ed., Bioreversible Carriers in Drug Design, American Pharmaceutical 
Association and Fergamon Press, 1987, both of which are incorporated herein by 
reference. 

As mentioned above, the pharmaceutical compositions of the present invention 
30 additionally comprise a pharmaceutically acceptable carrier, which, as used herein, 
means a non-toxic, inert solid, semi-solid or liquid filler, diluent, encapsulating 
material, or formulation auxiliary of any type. Some examples of materials which can 
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serve as pharmaceutically acceptable carriers are sugars such as lactose, glucose and 
sucrose; starches such as corn starch and potato starch; cellulose and its derivatives 
such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; 
powdered tragacanth; malt; gelatin; talc; excipients such as cocoa butter and 
5 suppository waxes; oils such as peanut oil, cottonseed oil; safflower oil; sesame oil; 
olive oil; corn oil and soybean oil; glycols; such a propylene glycol; esters such as 
ethyl oleate and ethyl laurate; agar; buffering agents such as magnesium hydroxide 
and aluminum hydroxide; alginic acid; water; isotonic saline; Ringer's solution; ethyl 
alcohol, and phosphate buffer solutions, dextrose solutions, as well as other non-toxic 

10 compatible lubricants such as sodium lauryl sulfate and magnesium stearate, as well 
as coloring agents, releasing agents, coating agents, sweetening, flavoring and 
perfuming agents, preservatives and antioxidants can also be present in the 
composition, according to the judgment of the formulator. 

Liquid dosage forms for oral administration include pharmaceutically 

15 acceptable emulsions, microemulsions, solutions, suspensions, syrups and elixirs. In 
addition to the active compounds, the liquid dosage forms may contain inert diluents 
commonly used in the art such as, for example, water or other solvents, solubilizing 
agents and emulsifiers such as ethyl alcohol, isopropyl alcohol, ethyl carbonate, ethyl 
acetate, benzyl alcohol, benzyl benzoate, propylene glycol, 1,3-butylene glycol, 

20 dimethylformamide, oils (in particular, cottonseed, groundnut, corn, germ, olive, 
castor, and sesame oils), glycerol, tetrahydro&rfuryl alcohol, polyethylene glycols 
and fatty acid esters of sorbitan, and mixtures thereof. Besides inert diluents, the oral 
compositions can also include adjuvants such as wetting agents, emulsifying and 
suspending agents, sweetening, flavoring, and perfuming agents. 

25 Injectable preparations, for example, sterile injectable aqueous or oleaginous 

suspensions may be formulated according to the known art using suitable dispersing 
or wetting agents and suspending agents. The sterile injectable preparation may also 
be a sterile injectable solution, suspension or emulsion in a nontoxic parenterally 
acceptable diluent or solvent, for example, as a solution in 1,3-butanediol. Among the 

30 acceptable vehicles and solvents that may be employed are water, Ringer's solution, 
U.S.P. and isotonic sodium chloride solution. In addition, sterile, fixed oils are 
conventionally employed as a solvent or suspending medium. For this purpose any 
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bland fixed oil can be employed including synthetic mono- or diglycerides. In 
addition, fatty acids such as oleic acid are used in the preparation of injectables. 

The injectable formulations can be sterilized, for example, by filtration 
through a bacterial-retaining filter, or by incorporating sterilizing agents in the form of 
5 sterile solid compositions which can be dissolved or dispersed in sterile water or other 
sterile injectable medium prior to use* 

In order to prolong the effect of a drug, it is often desirable to slow the 
absorption of the drug from subcutaneous or intramuscular injection. This may be 
accomplished by the use of a liquid suspension of crystalline or amorphous material 

10 with poor water solubility. The rate of absorption of the drug then depends upon its 
rate of dissolution which, in turn, may depend upon crystal size and crystalline form. 
Alternatively, delayed absorption of a parenterally administered drug form is 
accomplished by dissolving or suspending the drug in an oil vehicle. Injectable depot 
forms are made by forming microencapsulated matrices of the drug in biodegradable 

1 5 polymers such as polylactide-polyglycolide. Depending upon the ratio of drug to 
polymer and the nature of the particular polymer employed, the rate of drug release 
can be controlled. Examples of other biodegradable polymers include 
poly(orthoesters) and poly(anhydrides). Depot injectable formulations are also 
prepared by entrapping the drug in liposomes or microemulsions which are 

20 compatible with body tissues. 

Compositions for rectal or vaginal administration are preferably suppositories 
which can be prepared by mixing the compounds of this invention with suitable non- 
irritating excipients or carriers such as cocoa butter, polyethylene glycol or a 
suppository wax which are solid at ambient temperature but liquid at body 

25 temperature and therefore melt in the rectum or vaginal cavity and release the active 
compound. 

Solid dosage forms for oral administration include capsules, tablets, pills, 
powders, and granules. In such solid dosage forms, the active compound is mixed 
with at least one inert, pharmaceutically acceptable excipient or carrier such as sodium 
30 citrate or dicalcium phosphate and/or a) fillers or extenders such as starches, lactose, 
sucrose, glucose, mannitol, and silicic acid, b) binders such as, for example, 
carboxymethylcellulose, alginates, gelatin, polyvinylpyrrolidone, sucrose, and 
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acacia, c) humectants such as glycerol, d) disintegrating agents such as agar-agar, 
calcium carbonate, potato or tapioca starch, alginic acid, certain silicates, and sodium 
carbonate, e) solution retarding agents such as paraffin, f) absorption accelerators such 
as quaternary ammonium compounds, g) wetting agents such as, for example, cetyl 
5 alcohol and glycerol monostearate, h) absorbents such as kaolin and bentonite clay, 
and i) lubricants such as talc, calcium stearate, magnesium stearate, solid polyethylene 
glycols, sodium lauryl sulfate, and mixtures thereof. In the case of capsules, tablets 
and pills, the dosage form may also comprise buffering agents. 

Solid compositions of a similar type may also be employed as fillers in soft 

10 and hard-filled gelatin capsules using such excipients as lactose or milk sugar as well 
as high molecular weight polyethylene glycols and the like. The solid dosage forms of 
tablets, dragees, capsules, pills, and granules can be prepared with coatings and shells 
such as enteric coatings and other coatings well known in the pharmaceutical 
formulating art. They may optionally contain opacifying agents and can also be of a 

15 composition that they release the active ingredient(s) only, or preferentially, in a 
certain part of the intestinal tract, optionally, in a delayed manner. Examples of 
embedding compositions that can be used include polymeric substances and waxes. 
Solid compositions of a similar type may also be employed as fillers in soft and hard- 
filled gelatin capsules using such excipients as lactose or milk sugar as well as high 

20 molecular weight polethylene glycols and the like. 

The active compounds can also be in micro-encapsulated form with one or 
more excipients as noted above. The solid dosage forms of tablets, dragees, capsules, 
pills, and granules can be prepared with coatings and shells such as enteric coatings, 
release controlling coatings, and other coatings well known in the pharmaceutical 

25 formulating art. In such solid dosage forms the active compound may be admixed 
with at least one inert diluent such as sucrose, lactose or starch. Such dosage forms 
may also comprise, as is normal practice, additional substances other than inert 
diluents, e.g., tableting lubricants and other tableting aids such a magnesium stearate 
and microcrystalline cellulose. In the case of capsules, tablets and pills, the dosage 

30 forms may also comprise buffering agents. They may optionally contain opacifying 
agents and can also be of a composition that they release the active ingredient(s) only, 
or preferentially, in a certain part of the intestinal tract, optionally, in a delayed 
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manner. Examples of embedding compositions that can be used include polymeric 
substances and waxes. 

Dosage forms for topical or transdermal administration of a compound of this 
invention include ointments, pastes, creams, lotions, gels, powders, solutions, sprays, 
5 inhalants or patches. The active component is admixed under sterile conditions with a 
pharmaceutical^ acceptable carrier and any needed preservatives or buffers as may be 
required. Ophthalmic formulation and ear drops are also contemplated as being within 
the scope of this invention. The ointments, pastes, creams and gels may contain, in 
addition to an active compound of this invention, excipients such as animal and 

10 vegetable fats, oils, waxes, paraffins, starch, tragacanth, cellulose derivatives, 
polyethylene glycols, silicones^ bentonites, silicic acid, talc and zinc oxide, or 
mixtures thereof. Powders and sprays can contain, in addition to the compounds of 
this invention, excipients such as lactose, talc, silicic acid, aluminum hydroxide, 
calcium silicates and polyamide powder, or mixtures of these substances. Sprays can 

15 additionally contain propellants known in the art such as chlorofluorohydrocarbons. 
Transdermal patches have the added advantage of providing controlled 
delivery of a compound to the body. Such dosage forms can be made by dissolving or 
dispensing the compound in the proper medium. Absorption enhancers can also be 
used to increase the flux of the compound across the skin. The rate can be controlled 

20 by either providing a rate controlling membrane or by dispersing the compound in a 
polymer matrix or gel. 

In yet another aspect, the present invention also provides a pharmaceutical 
pack or kit comprising one or more containers filled with one or more of the 
ingredients of the pharmaceutical compositions of the invention, and in certain 

25 embodiments, includes an additional approved therapeutic agent for use as a 

combination therapy. Optionally associated with such containers) can be a notice in 
the form prescribed by a governmental agency regulating the manufacture, use or sale 
of pharmaceutical products, which notice reflects approval by the agency of 
manufacture, use or sale for human administration. Instructions for use of the 

30 compound(s) may also be included. 

According to the methods of treatment of the present invention, cancer, 
particularly breast cancer, is treated or prevented in a patient such as a human or other 
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mammal by administering to the patient a therapeutically effective amount of a 
compound of the invention, in such amounts and for such time as is necessary to 
achieve the desired result. By a "therapeutically effective amount" of a compound of 
the invention is meant a sufficient amount of the compound to treat (e.g. to ameliorate 
5 the symptoms of, delay progression of, prevent recurrence o£ cure, etc.) cancer, 
particularly breast cancer, at a reasonable benefit/risk ratio, which involves a 
balancing of the efficacy and toxicity of the compound. In general, therapeutic 
efficacy and toxicity may be determined by standard pharmacological procedures in 
cell cultures or with experimental animals, e.g., by calculating the ED* (the dose that 

10 is therapeutically effective in 50% of the treated subjects) and the LD 50 (the dose that 
is lethal to 50% of treated subjects). The ED 5f /LD 50 represents the therapeutic index, 
of the compound. Although in general drugs having a large therapeutic index are 
preferred, as is well known in the art, a smaller therapeutic index may be acceptable in 
the case of a serious disease, particularly in the absence of alternative therapeutic 

15 options. Ultimate selection of an appropriate range of doses for administration to 
humans is determined in the course of clinical trials. 

It will be understood that the total daily usage of the compounds and 
compositions of the present invention for any given patient will be decided by the 
attending physician within the scope of sound medical judgment The specific 

20 therapeutically effective dose level for any particular patient will depend upon a 

variety of factors including the disorder being treated and the severity of the disorder; 
the activity of the specific compound employed; the specific composition employed; 
the age, body weight, general health, sex and diet of the patient; the time of 
administration, route of administration, and rate of excretion of the specific compound 

25 employed; the duration of the treatment; drugs used in combination or coincidental 
with the specific compound employed; and like factors well known in the medical 
arts. 

The total daily dose of the compounds of this invention administered to a 
human or other mammal in single or in divided doses can be in amounts, for example, 
30 from 0.01 to 50 mg/kg body weight or more usually from 0.1 to 25 mg/kg body 

weight Single dose compositions may contain such amounts or submultiples thereof 
to make up the daily dose. In general, treatment regimens according to the present 
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invention comprise administration to a patient in need of such treatment from about 
0.1 ng to about 2000 mg of the compound(s) of the invention per day in single or 
multiple doses. 
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EXAMPLES 

Note: A numbered list of references for the Examples appears following the 
Examples, all of which are incorporated herein by reference. 

5 Example 1 

Preparation of Microarrays Containing 8498 Human cDNAs 

The human cDNA clones used in this study were obtained from Research Genetics 
(Huntsville AB, USA) as bacterial colonies in 96-well microtiter plates. The clones 

10 were chosen from a set of 15,000 cDNA clones that corresponded to the Research 
Genetics Human Gene Filters sets GF200-202 (http://www.resgen.com/) . These- 
clones form part of a set of clones assembled by the LM.A.G.E. consortium (Lennon, 
G.G., Auflray, C, Polymeropoulos, M., Soares, M.B. The I.M.A.G.E. Consortium: 
An Integrated Molecular Analysis of Genomes and their Expression. Genomics 

15 33:151-152,1996) and are identified by LMA.G.E. clone ID numbers. All clones 
printed on these arrays were sequence validated as part of a product offered at 
Research Genetics, Inc. We estimate that greater than 97% of the clones on the array 
are correctly identified. 

A detailed protocol for the production of the cDNA microarrays used in this 

20 study is available at http://cmgm.stanford.edu/pbrown/protocols.html and is 
reproduced below with insubstantial changes. As described below, the protocol 
includes steps of (1) cleaning the glass slides onto which the DNAs (e.g., products of 
PCR reactions) are to be spotted; (2) spotting the DNAs onto the glass slides with an 
arrayer; (3) Post processing to prepare arrays containing spotted DNAs for 

25 hybridization. AH procedures are done at room temperature and with double distilled 
water unless otherwise stated. Unless otherwise stated, in this Example and the 
following Examples, reagents are prepared according to protocols available in 
Maniatis, T., Sambrook, J. and Fritsch, E., Molecular Cloning: A Laboratory Manual 
(3 Volume Set), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1989, the 

30 contents of which are herein incorporated by reference. 

Cleaning Slides 
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Use 30 slide racks in 350mL glass dishes 

1. Dissolve 50g of NaOH pellets into 150ml ddH20 

2. Add 200ml of 95% EtOH, stir until completely mixed 

3. If solution remains cloudy, add ddH20 until clear 
5 4. Pour solution into glass slide box. 

5. Drop in 30 slides in a metal rack. (Gold Seal slides, Cat. 3010) 

6. Let soak on an orbital shaker for at least two hours 

7. Rinse slides by transferring rack to slide dish filled with ddH20 

8. Repeat ddH20 rinses x3. It's important to remove all traces of the 
10 NaOH-ethanol. 

9. Prepare Poly-l-lysine solution; Use Sigma Poly-l-iysine solution. Cat. No. 

8920 

1 0. Add 70mL poly-l-lysine to 280ml of water 

11. Transfer slides to poly-l-lysine solution and let soak for 1 hour. 

15 12. Remove excess liquid from slides by spinning the rack of slides on microtiter 

plate 

carriers at SOOrpm. 

13. Dry slides at 40 degrees C for 5 minutes in a vacuum oven. 

14. Store slides in a closed box for at least two weeks prior to use. 

20 15. Before printing arrays, check a sample slide to make sure it's hydrophobic 

(water should bead off it) but the lysine coating is not turning opaque. 

Arraying 

1. Transfer PCR reactions to 96-well V-bottom tissue culture plates (Costar). 

25 Add 1/10 vol. 3M sodium acetate (pH 5.2) and equal volume isopropanol Store at -20 
C for a few hours. 

2. Centrifuge in Sorvall at 3500 RPM for 45 min. Rinse with 70% EtOH, 
centrifuge again and dry. 

2. Resuspend DNA in 12ul 3X SSC for a few hours and transfer to flexible 
30 U-bottom printing plates. 

4. Spot DNA onto poly-l-lysine slides with an arrayer. 
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Post processing 

1. Rehydrate arrays by suspending slides over a dish of warm ddH20. (~1 
minute) 

2. Snap-dry each array (DNA side up) on a 1 00C hot plate for 3 seconds. 

5 3. UV cross-link DNA to the glass by using a Stratalinker set for 60 milliJoules 

4. Dissolve 5g of succinic anhydride (Aldrich) in 315mL of 
n-methyl-pyrrolidinone. 

5. To this, add 35mL of 0.2M NaBorate pH 8.0 (made by dissolving boric acid in 
water and adjusting the pH with NaOH), and stir until dissolved. 

1 0 6. Soak arrays in this solution for 15 minutes with shaking. 

7. Transfer arrays to 95C water bath for 2 minutes 

8. Quickly transfer arrays to 95% EtOH for 1 minute. 

9. Remove excess liquid from slides by spinning the rack of slides on microtiter 

plate 

1 5 carriers at 500rpm. 

10. Arrays can be used immediately. 

Reagent Suppliers 

20 Microscope slides Goldseal brand. (Cat 3010) 

Poly-Mysine solution Sigma product number P8920 
Succinic Anhydride Aldrich product number 23,969-0 
N-Methyl-Pyrrolidinone Aldrich product number 32,863-4 

25 

Microarrays were prepared according to the above protocol using the 8498 
cDNA clones described above. All microarrays used in the experiments described 
herein were from a single print run batch of microarrays. 

30 Example 2 
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Cell Lines, Breast Tissue, and Breast Tumor Samples for Microarray Analysis and 

Preparation of mRNA Samples 

Common Reference Sample 
5 Each of the 84 experimental samples tested here was analyzed by a 

comparative hybridization, using a common reference RNA pool as a standard; this 
reference sample was composed of equal mixtures of mRNA isolated from 1 1 
established cell lines derived from human tissue (MCF7, Hs578T, OVCAR3, HepG2, 
NTERA2, MOLT4, RPM-8226, NB4+ATRA, UACC-62, SW872, and Colo205: also 

10 see Table 2 for more details). The 1 1 cell lines were all grown to 70-90% confluence 
in RPMI medium, containing 10% Fetal Calf Serum and Penicillin/Streptomycin. The 
cells were harvested either by scraping or centrifugation, quickly resuspended in RNA 
lysis buffer and mRNA prepared using the FastTrack™ 2.0 mRNA Isolation Kit 
(Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. In each case, 

1 5 multiple individual mRNA preparations were collected for each cell line, which were 
then pooled together and analyzed via Northern analysis before final mixing to ensure 
the quality of the input mRNAs (e.g., to confirm that the mRNA exhibited a size 
distribution indicating that it was substantially nondegraded). The 1 1 mRNA samples 
were then mixed together in equal amounts, aliquoted in lOmM Tris (7.4), and stored 

20 at -80 C until use (2 micrograms of common reference sample was used per 
microarray hybridization and was always labeled using Cy3). 

Normal Breast Tissue 

Three samples of normal breast tissue were analyzed. Two of the samples 
25 were obtained from Clontech (Palo Alto, CA) and were pools of six (Normal 1) or two 
(Normal2) whole normal breasts. The third sample (NormaD) was obtained from a 
single individual. 

Breast Tumor Samples 
30 The 40 individual breast tumor samples were collected at either Stanford 

University in Stanford CA, USA, or in the Haukeland University Hospital in Bergen, 
Norway. Twenty of the forty breast tumors were sampled twice as part of a larger 
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Norwegian study on locally advanced breast cancers (T3/T4 and/or N2 tumors) and 
have been described previously (Aas, T., et aL, Nat Med, 2, 81 1-814, 1996, the 
contents of which are incorporated herein by reference) ; these patients underwent an 
open surgical biopsy before treatment with doxorubicin monotherapy (range 12-23 
5 weeks), followed by the definitive surgical resection of the remaining tumor after 
therapy, and were evaluated for clinical responses according to UICC criteria 
(Hayward, J., et ah, Br. J. Cancer, 35, 292-298, 1977). In addition to the 20 pairs, 
there were 8 additional '"before" specimens from Norway and 10 tumor specimens 
from Stanford (all Stanford tumors tested had a diameter of 3 cm or larger). Finally, 2 
10 of the 10 Stanford tumor specimens assayed were also paired with a lymph node 
metastasis from the same patient. 

mRNA Isolation from Breast Tumor and Tissue Samples 

Following their excision, breast tumor samples were rapidly frozen in liquid 
15 N2 and then stored at -80 C until use. mRNA was isolated from breast tumors and 
normal breast tissue using the Trizol Reagent (Gibco-BRL) and Invitrogen FastTrack 
2.0 Kit (all Stanford samples, and see http://genome- 

www.stanford.edu/sbcmp/web.shtm 1 for the detailed protocol) or using the Trizol 
Reagent followed by Dynal bead separation for the mRNA purification step (all 

20 Norway tissue samples). Briefly, frozen tumor samples were cut into small pieces and 
immediately placed into 12 ml of Trizol Reagent Each tumor sample in Trizol was 
homogenized using aPowerGen 125 Tissue Homogenizer (Fisher Scientific), and 
total RNA was isolated according to the Trizol reagent manufacturer's protocol. 
Tumor mRNA was isolated according to the manufacturer's protocols using the 

25 FastTrack 2.0 Kit (Invitrogen) or Dynal beads. 

Example 3 

Characterization of Breast Tissue and Tumor Samples 

30 
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For all but two of the tumor specimens (i.e. New York 1 and New York 2), the 
mutational status of the TPS 3 gene was determined using published methods (Aas, T., 
etal). 

A single pathologist (applicant Matt van de Rijn) reviewed hematoxylin and 
5 eosin (H&E) sections of each tumor, including all before and after pairs, and made a 
histological evaluation of each while blinded to the source. Tumors were graded using 
a modified version of the Bloom-Richardson method (Robbins, P., et al, Hum Pathol, 
26, 873-879, 1995). These data are displayed in Table 4. Representative H&E 
sections of each tumor are posted on Applicants' website at http://genome- 

10 www.stanford.edu/molecularportraits/. 

Immunohistochemistry was performed as described previously (Perou, C, et 
al, 1999; Bindl, J. and Warnke, R., Am J Clin Pathol, 85, 490-493, 1986, and 
Natkunam, Y., et al., Am. J. Path., 156(1), 2000, the contents of which are 
incorporated herein by reference). The antibodies used included the commercially 

15 available monoclonal antibodies CAM5.2 (specific for keratins 8/1 8, available from 
Becton Dickinson), anti-keratin 5/6 (available originally from Boehringer Mannheim, 
Indianapolis, IN, cat. no. 1273396 and now from Chemicon International, Temekula, 
CA ), anti-keratin 17 (clone E3, available from Dako, Carpinteria, CA, cat. no. 
M7046), anti-CD3 (available from Dako), and antiimmunoglobulin light chain 

20 (A191, A193, available from Dako). These immunohistochemical methods were 
applied for all the immunohistochemical studies described in the present application 
unless otherwise stated. 

Example 4 

25 

cDNA Synthesis and Labeling and Microarray Hybridization 

mRNA was isolated from breast tissue, breast tumor samples, and cell lines as 
described in Example 2. Fluorescently labeled cDNA was synthesized from the 
30 mRNA using a reverse transcriptase reaction that included dUTP labeled with either 
Cy3 or Cy5. For each hybridization experiment differentially labeled cDNA samples 
(an experimental sample and a reference sample) were pooled and hybridized to a 
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cDNA microarray, which was then scanned as described in Example 4. The protocol 
below provides details of the steps performed for cDNA synthesis and labeling and 
for microarray hybridization. 

5 1 . To set up for the reverse transcriptase (RT) reaction, combine the following (e.g., in 
an Eppendorf tube): 

(a) Anchored Oligo dT primer - 2 microliters at 2.5 micrograms/microliter or 
control - 2 microliters 

10 

(b) mRNA - (whatever volume is needed to reach 1.5-2 micrograms) 

(c) DEPC/H20 - add sufficient volume so that final volume is 16 microliters 

1 5 2. Heat at 70° C for 10 minutes 

3. Chill on ice for 1-2 minutes 

4. Add the following RT reaction components to each individual tube: 

(a) 5X RT Buffer - 6 microliters 

20 

(b) SOX dNTPs - 0.7 microliters - (500mm A,C,G, 200mm T) 

(c) Cy Dyes dUTP - 3 microliters - (either Cy3 or Cy5) 
25 (d) DTT Stock - 3 microliters - (comes with RT setup) 

(e) Superscript II RT— 1.7 microliters - (cat# 18064-014 Gibco-BRL) 

5. Mix well 

30 6. Incubate at 42° C for 1 hour 

7. Add another 1 microliter of Superscript II RT and mix 

8. Incubate at 42° C for 1 more hour 
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9. Degrade mRNA with 1 .5 microliters of 1M NaOH / 2mM EDTA 

10. Incubate at 65° C for 8 minutes (do NOT go TOO long here) 

1 1. Add 15 microliters of 0.1M HCL 

12. Add 450 microliters of TE (pH 7.4) to each sample and place each sample into a 
5 microcon-30 filter. 

13. Add 15 microliters of Human COT1 DNA (Gibco-BRL = 1 microgram/microliter) 
to each sample in the microcon filter. 

14. Spin in Eppendorf centrifuge until volume equals about 50 microliters (8-1 0 1 ) 

15. Remove flowthroughs, and pool Cy3 and Cy5 flowthroughs together for future 
1 0 recovery of Cy dyes (store at -20 ' C). 

16. Invert microcons, recover labeled samples, and pool Cy3 and Cy5 samples 
together that will be used for an individual experiment, in a single microcon filter that 
was used in step 15. 

17. Add 500 microliters of T.E again, and spin until final volume equals 8 microliters 
15 or less (BE VERY CAREFUL TO NOT SPIN THE SAMPLE DRY! ! !) 

18. To the 8 microliter combined Cy3 + Cy5 sample, add the following: 

(a) Yeast tRNA - 1 microliter - (10 micrograms/microliter) 
20 (b) PolyA DNA - 2 microliters - (10 micrograms/microliter) 

(c) 20XSSC - 2 microliters - (FINAL SSC concentration approximately 3X) 

(d) 10% SDS - 0.3 microliters 

25 

FINAL VOLUME - 13.3 MICROLITERS 

19. Mix well. 

20. Heat sample at 100° C for 2 minutes, spin very briefly. 
30 21 . Place samples at 42° C for 20-30 minutes. 

22. During Step 21, prepare the necessary number of hybridization chambers (Custom 
made by Die-Tech, San Jose, CA (see "Drawings for custom parts at 
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http://cmgm.stanford.edu/pbrown/mgu^ or purchased at Corning 

Costar, Acton, MA (CTM™ Hybridization Chamber, #2551), get 22mm X 22mm 
coverslips ready, and get arrays ready. 

23. Add the 13 microliters of probe (i.e., labeled cDNA mixture) onto the center of the 
5 array while NOT actually touching the array face with the pipette tip. 

24. Quickly and gently place the 22mm X 22mm glass#l coverslip onto the array 
face. 

25. Add about 15-20 microliters of 3XSSC in two drops onto the end of the array slide 
away from the actual array for hydration proposes. 

1 0 26. Assemble the hybridization chamber with the array slide in it, and place into a 65 
C water bath overnight 

27. Pull out the hybridization chamber and dry off the excess H z O. 

28. Disassemble the hybridization chamber, and quickly place the slides into a slide 
washing chamber that contains 2XSSC/0.05%SDS. Jiggle the slide holder up and 

15 down until the slide coverslip falls off. Repeat this individually for each array, one at 
a time, until all are done 

29. Wash slides in 1XSSC for 3-5 minutes. 

30. Wash slides in 50 C 0.2XSSC for 3-5 minutes, twice. 

3 1 . Spin slides down in centrifuge at 200 RPM for 2 minutes. 
20 32.SCAN immediately. 

Example 5 

Collection, Processing, and Analysis of Data from Microarray Hybridizations 

25 Hie cDNA microarrays were scanned with either a General Scanning 

(Watertown, MA) ScanArray 3000 at 20 microns resolution, or with a prototype Axon 
Instruments (Foster City, CA) GenePix Scanner at 10 micron resolution. The output 
files, which were TIFF images, were then analyzed using the program ScanAlyze (M. 
Eisen; available at http://www.microarrays.org/software) . Fluorescent ratios and 

30 quantitative data on spot quality (see ScanAlyze manual) were stored in a prototype of 
the AMAD database (M. Eisen; available at http://www.microarrays.org/software) . 
Areas of the array with obvious blemishes were manually flagged and excluded from 
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subsequent analyses. The primary data tables can be downloaded at http://genome- 
www.stanford.edu/molecularportraits/, in text/tab delimited format after obtaining a 
password; 

Data were extracted from the database in a single table, with each row 
5 representing an array element, each column a hybridization, and each cell the 
observed fluorescent ratio for the array element in the appropriate hybridization. 
Previously flagged spots were excluded, as were spots that did not pass quality 
control. This table had 9216 rows and 84 columns. Array elements were removed if 
they were not well measured in at least 80% of the hybridizations. The data table was 

10 split into tumors and cell lines, and the two subtables were separately median polished 
(the rows and columns were iteratively adjusted to have median 0) before being 
rejoined into a single table. Genes whose expression varied by at least 4-fold from the 
median in this sample set in at least three of the samples tested were selected for the 
analyses that led to the identification of polynucleotides encoding BSTP-ECG1 (1753 

1 5 genes satisfied these conditions). 

Average-linkage hierarchical clustering, as implemented in the program 
Cluster (M. Eisen; http://www.microarravs.org/software) , was applied separately to 
both the genes and arrays. The results were analyzed, and figures generated, using 
TreeView (M Eisen; http://www.microarrays.org/software) . 

20 

Example 6 

Producing Antibodies to the BSTP-ECG1 Polypeptide 

25 

This example describes the preparation of a polyclonal antibody that binds to 
the BSTP-ECG1 polypeptide, i.e., an antibody that binds to a polypeptide comprising 
an amino acid sequence as set forth in SEQ ID NO: 1. The example further describes 
affinity purification of the antibody. 

30 

Materials 

• Anisole (Cat. No. A4405, Sigma) 
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• 2,2 , -azino-di-(3-ethyl-ben2tiiiazoline-sulfonic acid) (ABTS) (Cat. No. A6499, 
Molecular Probes Eugene, OR) 

• Activated Maleimide Keyhole Limpet Cyanin (Cat. No. 77106, Pierce Chemical 
Co. Rockford, IL) 

5 • Biotin (Cat. No. B2643, Sigma) 

• Boric acid (Cat No. B0252, Sigma) 

• Sepharose 4b (Cat. No. 17-0120-01, LKB/Pharmacia, Uppsala, Sweden) 

• Bovine Serum Albumin (LP) (Cat. No. 100 350, Boehringer Mannheim, 
Indianapolis, IN) 

10 • Cyanogen bromide (Cat. No. C6388 Sigma, St Louis, MO) 

• Dialysis tubing Spectra/Por Membrane MWCO: 6-8,000 (Cat. No. 1 32 665, 
Spectrum Industries Inc., Laguna Hills, CA) 

• Dimethyl formamide (DMF) (Cat. No. 22705-6, Aldrich Chemical Company, 
Milwaukee, WI) 

15 • DIC (Cat. No. BP 592-500, Fisher) 

• Ethanedithiol (Cat. No. 39,802-0, Aldrich Chemicals, Milwaukee, WI) 

• Ether (Cat No. TX 1275-3, EM Sciences) 

• Ethylenediaminetetraacetatic acid (EDTA)(Cat No. BP 1204, Fisher Scientific, 
Springfield, NJ) 

20 • l-ethyl-3-(3 'dimethylaminopropyl)-carbodiimide, HCL (EDC) (Cat No. 34 1-006, 
Calbiochem, San Diego, CA) 

• Freund's Adjuvant, complete (Cat No. M-0638-50B, Lee Laboratories, Grayson, 
GA) 

• Freund's Adjuvant, incomplete (Cat. No. M0639-50B, Lee Laboratories) 

25 • Fritted chromatography columns (Column part No. 1213i011; Frit: Part No. 
12131029, Varian Sample Preparation Products, Harbor City, CA) 

• Gelatin from Bovine Skin (Cat. No. G9382, Sigma) 

• Glycine (Cat No. BP381-5, Fisher) 

• Goat anti-rabbit IgG, biotinylated (Cat No. A 0418, Sigma) 
30 HOBt (Cat No. 01-62-0008, Calbiochem-Novabiochem) 

• Horseradish peroxidase (HRP) (Cat No. 8 1 4 393, Boehringer Mannheim) 

• HRP-Streptavidin (Cat. No. S 55 12, Sigma) 
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• Hydrochloric Acid (Cat No. 71445-500, Fisher) 

• Hydrogen Peroxide 30% w/w (Cat. No. HI 009, Sigma) 

• Methanol (Cat. No. A412-20, Fisher) 

• Microtiter plates, 96 well (Cat. No. 2595, Corning-Costar Pleasanton, CA) 

5 • N-D-Fmoc protected amino acids available from Calbiochem-Novabiochem, San 
Diego, CA. See 1997-1998 catalog pages 1-45. 

• N-D-Fmoc protected amino acids attached to Wang Resin available from 
Calbiochem-Novabiochem. See 1997-1998 catalog pages 161-164. 

• NMP (Cat. No. CAS 872-50-4, Burdick and Jackson, Muskegon, MI) 
10 • Peptide (Synthesized by Research Genetics, Inc. Details given below) 

• Piperidine (Cat. No. 80640, Fluka, available through Sigma) 

• Sodium Bicarbonate (Cat No. BP328-1, Fisher) 

• Sodium Borate (Cat. No. B9876, Sigma) 

• Sodium Carbonate (Cat. No. BP357-1, Fisher) 
15 • Sodium Chloride (Cat. No. BP 358-10, Fisher) 

• Sodium Hydroxide (Cat. No. SS 255-1, Fisher) 

• Streptavidin (Cat. No. 1 520, Boehringer Mannheim) 

• Thioanisole (Cat. No. T-2765, Sigma) • 

• Trifluoroacetic acid (Cat. No. TX 1275-3, EM Sciences) 
20 ♦ Tween-20 (Cat. No. BP 337-500, Fisher) 

• Wetbox-(Rubbermaid Rectangular Servin' Saver™ Part No. 3862 Wooster, OH) 

Solutions 

• BBS - Borate Buffered Saline with EDTA dissolved in distilled water (pH 8.2 to 
25 8.4withHClorNaOH) 

-25 mM Sodium borate (Borax) 
-100 mM Boric Acid 
-75mMNaCI 
-5 mM EDTA 
30 • 0.1 NHC1 in saline 

-concentrated HC1 (8.3 mL/0.917 L distilled water) 

-0.154 MNaCl 
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• Glycine (pH 2.0 and pH 3.0) dissolved in distilled water and adjusted to the 
desired pH. 

-0.1 M glycine 
-0.154 MNaCl 

5 • 5X Borate IX Sodium Chloride dissolved in distilled water. 
-O.llMNaCl 
-60 mM Sodium Borate 
-250 mM Boric Acid . 

• Substrate Buffer in distilled water adjusted to pH 4.0 with sodium hydroxide: 
10 -50 to 1 00 mM Citric Acid 



Peptide Synthesis Solutions 

• AA solution: HOBt is dissolved in NMP (8.8 grams HOBt to 1 liter NMP). 
Fmoc-N-a-amino at a concentration at .53 M. 

15 • DIC solution: 1 part DIC to 3 parts NMP. 

• Deprotecting solution: 1 part Piperidine to 3 parts DMF 

• Reagent R: 2 parts anisole, 3 parts ethanedithiol, 5 parts thioanisole, 90 parts 
trifluoroacetic acid. 



20 Equipment 

• MRX Plate Reader (Dynatech Inc., Chantilly, VA) 

• Hamilton Eclipse (Hamilton Instruments, Reno, NV) 

• Beckman TJ-6 Centrifuge, Refrigerated (Model No. TJ-6, Beckman Instruments, 
Fullerton, CA) 

25 • Chart Recorder (Recorder 1 Part No. 18-1001-40, Pharmacia LKB Biotechnology) 

• UV Monitor (Uvicord SII Part No. 18-1004-50, Pharmacia LKB Biotechnology) 

• Amicon Stirred Cell Concentrator (Model 8400, Amicon Inc., Beverly, MA) 

• 30 kD MW cut-off filter (Cat No. YM-30 Membranes Cat. No. 1 3742, Amicon 
Inc., Beverly, MA) 

30 • Multi-channel Automated Pipettor (Cat No. 4880, Corning Costar Inc., 
Cambridge, MA) 

• pH Meter Corning 240 (Corning Science Products, Coming Glassworks, Corning, 
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NY) 

• ACT396 peptide synthesizer (Advanced ChemTech, Louisville, KY) 

• Vacuum dryer (Box is from Labconco, Kansas City, MO; Pump is from Alcatel, 
Laurel MD). 

5 • Lyophilizer (Unitop 600sl in tandem with Freezemobile 12, both from Virtis, 
Gardiner, NY) 0 

Methods 

Peptides were selected using the program Omiga ™1.1 (Oxford Molecular Group, 
10 Inc., 2105 So. Bascom Ave., Suite 200, Campbell, CA 95008) using the Hopp/Woods 
method, which is described in Hopp TP, Woods KR, Mol Immunol Apr;20(4):483-9 
A computer program for predicting protein antigenic determinants, 1983, and Hbpp 
TP and Woods KR, Proc. Nat, Acad Set U.SA. 78, 3824-3828, 1981. Three peptide 
sequences were selected. The sequences were selected from regions of the 
15 polypeptide that displayed minimal homology with known proteins. The sequences of 
the three peptides were as follows: 

Peptide 1 (SEQ ED NO: 6): KYIGFAPCIFHGRGLFSS 
Peptide 2 (SEQ ID NO: 7): ESLSSMPGKNAVTLR 
20 Peptide 3 (SEQ ID NO: 8): NRKGFVKLALRHGAD 

Synthesis of Peptides 

Each of the three peptides listed above was synthesized according to the following 
protocol: 

25 Incubate: Resin was immersed in appropriate solution. All incubation steps occurred 
with mixing. 

Wash: Added 2 mis. DMF, incubated 5 minutes and drained. 
Wash Cycle: Five washes. 

30 Machine Synthesis 

The sequence of the desired peptide was provided to the peptide synthesizer. The C- 
terminal residue was determined and the appropriate Wang Resin was attached to the 
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reaction vessel. The peptides were synthesized C-terminus to N-terminus by adding 
one amino acid at a time using a synthesis cycle. Which amino acid is added was 
controlled by the peptide synthesizer, which looks to the sequence of the peptide 
entered into its database. 

5 

Step 1 - Resin Swelling: Added 2 mL DMF, incubated 30 minutes, drained DMF. 
Step 2 - Synthesis cycle 

2a - Deprotection: 1 mL deprotecting solution was added to the reaction 
vessel and incubated for 20 minutes. 
10 2b -Wash Cycle 

2c - Coupling: 750 mL of amino acid solution and 250 mL of DIC solution 
were added to the reaction vessel The reaction vessel was incubated for 
thirty minutes and washed once. The coupling step was repeated once. 
2d -Wash Cycle 

15 Step 2 was repeated over the length of the peptide. The amino acid solution changed 
as the sequence listed in peptide synthesizer dictated. 
Step 3 - Final Deprotection: Steps 2a and 2b were performed one last time. 

Resins were deswelled in methanol — rinsed twice in 5 mL methanol, incubated 5 
20 minutes in 5 mL methanol, rinsed in 5 mL methanol — and then vacuum dried. - 

Peptide was removed from the resin by incubating 2 hours in reagent R and then 
precipitated into ether. Peptide was washed in ether and then vacuum dried. Peptide 
was resolubilized in diH20, frozen, and lyophilized overnight. 

25 

Conjugation of Peptide with Keyhole Limpet Hemocvanin 
Peptide (6 mg) was dissolved in PBS (6 mL) and mixed with 6 mg of maleiimide 
activated KLH carrier in 6 mL of PBS for a total volume of 12 mL. The entire 
solution was mixed for two hours, dialyzed in 1L PBS, and lyophilized. 

30 

Immunization of Rabbits 

Two New Zealand White Rabbits were injected with 250 \ig keyhole limpet 
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hemocyanin (KLH) conjugated peptide in an equal volume of complete Freund's 
adjuvant and saline in a total volume of 1 mL. Antigens (KLH-Peptide, 100 jag each) 
in an equal volume of incomplete Freund's Adjuvant and saline were injected into 
three to four subcutaneous dorsal sites for a total volume of 1 mL two, four, and six 
5 weeks after the first immunization. The three peptides were injected together. 

The immunization schedule was as follows: 



Day 0 


Pre-immune bleed, primary immunization 


Day 15 


1st Boost 


Day 27 


1st Bleed 


Day 44 


2nd Boost 


Day 57 


2nd Bleed and 3rd Boost 


Day 69 


3rd Bleed 


Day 84 


4th boost 


Day 98 


4th bleed 



10 The Collection of Rabbit Serum 

The rabbits were bled (30 to 50 mL) from the auricular artery. The blood was allowed 
to clot at room temperature for 15 minutes and the serum was separated from the clot 
using an IEC DPR-6000 centrifuge at 5000 x g. Cell-free serum was decanted gently 
into a clean test tube and stored at -20°C for affinity purification. 

15 

Determination of Antibody Titer 

All solutions with the exception of wash solution were added by the Hamilton 
Eclipse, a liquid handling dispenser. The antibody titer was determined in the rabbits 
using an ELISA assay with peptide on the solid phase. Flexible high binding ELISA 
20 plates were passively coated with peptide diluted in BBS (100 nL, 1 ng/well) and the 
plate was incubated at 4°C in a wetbox overnight (air-tight container with moistened 
cotton balls). The plates were emptied and then washed three times with BBS 
containing 0.1% Tween-20 (BBS-TW) by repeated filling and emptying using a semi- 
automated plate washer. The plates were blocked by completely filling each well with 
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BBS-TW containing 1% BSA and 0.1% gelatin (BBS-TW-BG) and incubating for 2 
hours at room temperature. The plates were emptied and sera of both pre- and post- 
immune serum were added to wells. The first well contained sera at 1 :50 in BBS; 
The sera were then serially titrated eleven more times across the plate at a ratio of 1:1 
5 for a final (twelfth) dilution of 1 :204,800. The plates were incubated overnight at 
4°C. The plates were emptied and washed three times as described. 

Biotinylated goat anti-rabbit IgG (100 jdL) was added to each microtiter plate test well 
and incubated for four hours at room temperature. The plates were emptied and 

10 washed three times. Horseradish peroxidase-conjugated Streptavidin (100 diluted 
1:10,000 in BBS-TW-BG) was added to each well and incubated for two hours at 
room temperature. The plates were emptied and washed three times. The ABTS was 
prepared fresh from stock by combining 10 mL of citrate buffer (0.1 M at pH 4.0), 0.2 
mL of the stock solution (15 mg/mL in water) and 10 jxL of 30% H 2 0 2 . The ABTS 

15 solution (100fiL) was added to each well and incubated at room temperature. The 
plates were read at 414 X, 20 minutes following the addition of substrate. 

Preparation of the Peptide Affinity Purification Column: 

The affinity column was prepared by conjugating 5 mg of peptide to 1 0 mL of 

20 cyanogen bromide-activated Sepharose 4B, and 5 mg of peptide to hydrazine- 

Sepharose 4B. Briefly, 100 uL of DMF was added to peptide (5 mg) and the mixture 
was vortexed until the contents were completely wetted. Water was then added (900 
|uL) and the contents were vortexed until the peptide dissolved. Half of the dissolved 
peptide (500 jiL) was added to separate tubes containing 10 mL of cyanogen-bromide 

25 activated sepharose 4B in 0.1 mL of borate buffered saline at pH 8.4 (BBS), and 10 
mL of hydrazine-Sepharose 4B in 0.1 M carbonate buffer adjusted to pH 4.5 using 
excess EDC in citrate buffer pH 6.0. The conjugation reactions were allowed to 
proceed overnight at room temperature. The conjugated sepharose was pooled and 
loaded onto fritted columns, washed with 10 mL of BBS, blocked with 10 mL of 1 M 

30 glycine, and washed with 10 mL 0.1 M glycine adjusted to pH 2.5 with HC1 and re- 
neutralized in BBS. The column was washed with enough volume for the optical 
density at 280A, to reach baseline. 
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Affinity Purification of the Antibody 

The peptide affinity column was attached to a UV monitor and chart recorder. 

5 The titered rabbit antiserum was thawed and pooled. The serum was diluted with one 
volume of BBS and allowed to flow through the columns at 10 mL per minute. The 
non-peptide immunoglobulins and other proteins were washed from the column with 
excess BBS until the optical density at 280 A, reached baseline. The columns were 
disconnected and the affinity purified column was eluted using a stepwise pH gradient 
10 from pH 7.0 to pH 1 .0. The elution was monitored at 280 nM, and fractions 
containing antibody (pH 3.0 to pH 1.0) were collected directly into excess 0.5 M 
BBS, Excess buffer (0.5 M BBS) in the collection tubes served to neutralize the 
antibodies collected in the acidic fractions of the pH gradient 

15 Hie entire procedure was repeated with "depleted" serum to ensure maximal recovery 
of antibodies. The eluted material was concentrated using a stirred cell apparatus and 
a membrane with a molecular weight cutoff of 30 kD. The concentration of the final 
preparation was determined using an optical density reading at 280 nM. The 
concentration was determined using the following formula: mg/mL = OD^l .4. 

20 

Example 7 

SDS-PAGE and Immunoblot Analysis of BSTP-ECG1 

To investigate the expression pattern of BSTP-ECG1, extracts are made from a 
25 variety of different cell lines and subjected to SDS-PAGE followed by 

immunoblotting according to the protocol below, using an affinity purified polyclonal 

antibody to BSTP-ECG1 prepared as described in Example 6. 

Materials 

• Acetic acid, Glacial (Cat No. A38°-212, Fisher) 
30 • Acrylamide (Cat No. A-3553, Sigma) 

• Anti-Rabbit IgG (H&L) (Cat No. 31460ZZ, Pierce) 

• Bis-acrylamide (Cat No. M-7279, Sigma) 
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• Blotting paper (Cat. No. 170-3960, Bio-Rad, Hercules, CA) 

• Bovine Serum Albumin (LP) (Cat. No. 1 00-350, Boehringer Mannheim, 
Indianapolis, IN) 

• Brilliant Blue R-250 (Cat. No.BP101-25, Fisher) 

5 • Complete™ Mini (Cat No. 1836153, Boehringer Mannheim) 

• ECL Western Blotting Detection Reagents (Cat No. RPN2106, Amersham 
Pharmacia Biotech, Piscataway, NJ) 

• Ethyl alcohol (AAPER Alcohol and Paper Chemical Co., Shelbyville, KY) 

• Gelplate Clean (Cat No. 786-140RF, Geno Technology, Inc., St Louis) 
10 • Gelatin (Cat No. G-2500, Sigma) 

• Glycerol (Cat. No. BP229-1, Fisher) 

• Glycine (Cat No. G-8898, Sigma) 

• Hybond ECL (Cat No. RPN303D, Amersham Pharmacia Biotech) 

• Lauryl Sulfate (SDS) (Cat. No. L-3771, Sigma) 
15 • Methanol (Cat. No. BP1 105-4, Fisher) 

• M-Per (Cat. No. 78501, Pierce, Rockford, EL) 

• Nalgene bottle top filters (Cat No. 09-740-62B, Fisher) 

• Nonfat dry milk (Kroger Co., Cincinnati, OH) 

• Ponceau-S (Cat. No. P-07170, Sigma) 

20 • Potassium phosphate (Cat No. P-0662, Sigma) 

• 2X SDS gel loading buffer (Cat. No. 750006, Research Genetics, Huntsville, AL) 

• Size markers (Cat No. M-3913, M-4038, M-3788, Sigma) 

• Sodium azide (Cat. No. S227I-25, Fish) 

• Sodium chloride (Cat No. S271-3, Fisher) 

25 • Sodium phosphate, Dibasic, Anhydrous (Cat. No. BP332-1, Fisher) 

• t-amyl alcohol (Cat. No. A-l 6852, Sigma) 

• TEMED (Cat. No. T-9281, Sigma) 

• Trizma® Base (Cat. No. T-6066, Sigma) 

• Tween-20 (Cat No. BP337-500, Fisher) 

30 

Solutions 

• PBS - Phosphate Buffered Saline dissolved in distilled water 
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-136mMNaCl 
-2.7mMKCl 
-10.1mMNa 2 HPO 4 
-1.8mMKH 2 P0 4 
5 • Acrylamide/Bis (30% T, 2.67% C) dissolved in distilled water 
-4.1 Macrylamide 
-51.9mMN,N'- 

• 1.5 M Tris-HCl (pH 8.8) dissolved in distilled water 

• 0.5 M Tris-HCl (pH 6.8) dissolved in distilled water 

10 • 10% SDS - dissolve 10 grams SDS in 100 mis distilled water 

• Running Buffer 
-24.8mMTris base 
-191.9 mM glycine 
-3.5 mM SDS 

15 • Towbin transfer buffer (pH 8.3) dissolved in distilled water 
-20% methanol 
-25mMTris 
-192 mM glycine 

• Equilibrating buffer for gel drying, mixed in distilled' water 
20 . -20% ethanol 

-10% glycerol 

• Gel staining solution dissolved in distilled water 
-0.3 mM Coomassie brilliant blue R-250 
-40% methanol 

25 -7% glacial acetic acid 

• Gel destaining solution mixed in distilled water 
-25% methanol 

-7% glacial acetic acid 

• 10%Tween®20inPBS 

30 • 5% Nonfat dry milk in PBS 

• 0.2% BSA Blocking Buffer dissolved in PBS 
-0.2% BSA 
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-0.1% gelatin 
-0.05%Tween®20 

• Wash Buffer 
-0.05%Tween®20 

5 -IX PBS 
Equipment 

• Microcentrifuge (Model 54 1 5, Eppendorf) 

• Power Pak 200 (Cat. No. 165-5052, Bio-Rad) 

• Power Pak 3000 (Cat. No. 165-5056, Bio-Rad) 
10 • Protean n xi Cell (Cat No. 165-1813, Bio-Rad) 

• Recirculating chiller (Cat No. CFT33D1 1 5 V, Neslab Instruments, Inc., 
Portsmouth, NH) 

• 20-Well comb (Cat. No. 165-1867, Bio-Rad) 

• pH Meter Corning 240 (Corning Science Products, Corning Glasswares, Corning, 
15 NY) 

• Air Cadet vacuum pump (Cat No. P-07530-50, Cole-Palmer Instruments Co., 
Chicago, IL) 

• Tissue Tearor tissue homogenizer (Cat. No. 985370-07, BioSpec Products Inc., 
Bartletsville, OK) 

20 

Methods 

Sample Preparation 

Preferably a variety of cell lines known in the art are used for the experiment, 
including cell lines derived from breast tumors, cell lines derived from normal breast 

25 tissue, cell lines derived from other cancer types, etc. A selection of appropriate 
cancer cell lines for investigation of the expression of BSTP-ECG1 is found in 
reference 21. A selection of noncancer cell lines appropriate for investigation of the 
expression of BSTP-ECG1 is found in Perou, et al., Molecular portraits of human 
breast tumours, Nature, 406(6797):747-52, 2000. Appropriate cell lines include 

30 MCF7, Hs578T, OVCAR3, HepG2, NTERA2, MOLT4, RPMI-8226, NB4+ATRA, 
UACC-62, SW872, and Colo205: also see Table 2 for more details). Cell lines are 
maintained under standard growth conditions and in standard tissue culture media as 
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appropriate for the particular cell line. Cells are collected according to standard 
techniques (e.g., trypsinization in the case of adherent cells), and the resulting cell 
suspension is prepared as follows: 

-The cell suspension is pelleted by centrifiigation at 3000 RPM for 10 minutes, and 
5 the supernatant was discarded. 

-The pellet is washed with 1ml PBS, centrifuged at 10000 RPM for 10 minutes, and 
the supernatant was discarded. 

-An appropriate volume of M-Per™ Reagent is added to the cell pellet and mixed 
gently for 10 minutes in an ice bath. The mixture is centrifuged at 13200 RPM for 15 
10 minutes, and the supernatant is saved. 

The protein concentration in the supernatant is measured according to standard 
techniques. 

All samples are mixed at 1:1 with gel loading buffer and boiled for 5 minutes before 
loading. 

15 

SDS PAGE 

Standard SDS-PAGE stacking and running gels are prepared and placed in an 
electrophoresis apparatus. After filling the upper and lower chambers with running 
buffers the samples (60 Dg/lane) are loaded. The inner core is placed in the lower 
20 chamber and the lid placed on top. The apparatus is connected to the power supply 
and recirculating system. The temperature setting is 10°C. The stacking gel is run at 
14mA per gel for 1 hour. The separating gel is run at 0.58mA per gel per hour for 16 
hours. 

25 Transfer to nitrocellulose 

After electrophoresis is complete, the gel is equilibrated in Towbin Buffer for 15-30 
minutes. The assembly for transfer is as follows: 
cathode 

pre-soaked blotting paper 
30 gel 

pre-wetted nitrocellulose 
pre-soaked blotting paper 
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anode 

The transfer is performed at 20V for 25 minutes, then 25V for 20 minutes. After the 
transfer is complete, the gel is stained with Coomassie and the blot is stained with 
Ponceau-S. 

5 

Western Blotting 

Primary and secondary antibodies 

All primary and secondary antibodies are diluted in 02% BSA blocking buffer. All 
incubation steps are done with gentle mixing. 
1 0 Blots are blocked in 5% milk overnight at room temperature. The blots are rinsed with 
wash buffer before adding the primary antibody and incubating for two hours at room 
temperature. 

One wash cycle is performed. One wash cycle consists of: 
Wash 5 min, rinse 
15 Wash 5 min, rinse 

Wash 10 min, rinse 
Wash 5 min, rinse 
Wash 5 min, rinse 

The secondary antibody is added and incubated for one hour at room temperature. 
20 One wash cycle is then performed. 

Peptide Block 

As a control to demonstrate the specificity of the antibody, equal amounts (w/w) of 
peptide and antibody are added to 1/10 of the final volume of blocking buffer and 
25 incubated overnight at 4°C. The volume of blocking buffer is then brought up to the 
final volume, and the membrane is incubated for an additional two hours at room 
temperature. 

Developing 

30 The blots are placed in a Ziploc® bag. Equal volumes of ECL western blotting 
detection reagents are mixed and distributed evenly over the blots. The blots are 
placed in an autoradiography cassette, covered with a piece of film, and exposed. 
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Example 8 

Detection of BSTP-ECG1 in Breast Cancer Samples by Immunohistochemistry 

5 A breast cancer tissue microarray consisting of tissue samples from a large 

number of breast cancer biopsies is prepared essentially as described in Kononen, J., 
et al., Tissue microarrays for high-throughput molecular profiling of tumor specimens, 
Nature Medicine, 4(7), 844-847, 1998. Briefly, several hundred archival paraffin- 
embedded breast tumor samples were obtained from the Pathology Department at 

10 Stanford University Medical Center. The samples were reviewed by a pathologist 
(applicant MVR) to ensure that they met pathological criteria for breast cancer. Small 
tissue cores were removed from the samples and embedded in a single paraffin block 
to produce a tissue array. Immunohistochemistry is performed as described 
previously (Perou, C, et al, 1999; Bindl, J. and Warnke, R., Am J Clin Pathol, 85, 

15 490-493, 1986, and Natkunam, Y., et al, Am. J. Path., 156(1), 2000, the contents of 
which are incorporated herein by reference) using an antibody to BSTP-ECG1 
generated as described in Example 6. 

Example 10 

20 Expression of BST-ECG1 mRNA in Cell Lines 

Materials and Methods 

Cell lines were grown from frozen stocks in RPMI-1640 supplemented with 

1 0% fetal bovine serum. Total RNA was extracted at approximately 80-90% 

confluence using TRIZOL reagent (Life Technologies) according to the 
25 manufacturer's protocol. Following total RNA extraction, mRNA was isolated with 

the FastTrack 2.0 kit (Invitrogen) following the manufacturer's protocol for isolating 

mRNA from total RNA. HepG2 is a liver tumor derived cell line (ATCC #HB-8065); 

COLO205 is a colon tumor derived cell line (ATCC #CCL-222); and MCF-7 is a 

breast adenocarcinoma derived cell line (ATCC #HTB-22). 
30 Five micrograms of each mRNA and RNA ladder ranging from 0.24-9.49kb 

(Life 
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Technologies) was size separated and blotted onto Hybond N+ membrane 
(Amersham). The membrane was hybridized at 42°C to a 32P-labeled cDNA probe of 
the BST-ECG1 gene in Hybrisol (Oncor) with 20Dg Sheared DNA (Research 
Genetics) and 5Dg human Cotl DNA (Life Technologies). Following hybridization, 
5 the nylon membrane was exposed to a phosphor screen. The digital image of the 
Northern was then acquired using the Packard cyclone phosphor imager in order to 
assoss the size of the BST-ECG1 transcript. 
Results 

The Northern blot showed 2 bands of approximately 1 .5 and 22 kB, consistent 
10 with the prediction of multiple isoforms due to alternate 3' processing (alternate 
polyadenylation sites). Expression of BST-ECG1 is present in all three cell types 
tested (Figure 6A). The highest level of expression was observed in the HepG2 cell 
line. Figure 6B presents a longer exposure demonstrating expression in MCF-7 cells. 
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1 CLAIMS 
2 

3 We claim: 

4 LA substantially purified polypeptide whose sequence comprises the polypeptide 

5 sequence of SEQ ID NO: 1 . 
6 

7 la. A fragment of the polypeptide of claim 1, wherein the fragment is at least 50 

S amino acids in length. 

9 

10 lb. A fragment of the polypeptide of claim 1, wherein the fragment is at least 100 . 

1 1 amino acids in length. 
12 

13 lc. A fragment of the polypeptide of claim 1, wherein the fragment is at least 1 50 

14 amino acids in length. 
15 

16 Id. A variant of the polypeptide of claim 1, wherein the variant includes between 1 

17 and 10 amino acid substitutions, inclusive. 
18 

19 le. A variant of the polypeptide of claim 1, wherein the variant includes between 1 1 

20 and 25 amino acid substitutions, inclusive. 
21 

22 If. A variant of the polypeptide of claim 1, wherein the variant includes between 26 

23 and 50 amino acid substitutions, inclusive. 
24 

25 1 g. A variant of the polypeptide of claim 1 9 wherein the variant includes an addition 

26 or substitution of between 1 and 10 amino acids, inclusive. 
27 

28 lh. A variant of the polypeptide of claim 1, wherein the variant includes an addition 

29 or substitution of between 1 1 and 25 amino acids, inclusive. 
30 
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1 1 i. A substantially purified polypeptide having significant similarity to a polypeptide 

2 whose sequence is set forth in SEQ ID NO: I, wherein a polypeptide is considered 

3 significantly similar i£ when the amino acid sequence of the polypeptide is compared 

4 with the amino acid sequence of the polypeptide of SEQ ID NO:l using the BLAST 

5 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

6 a % identity greater than 60 or a % positive greater than 70 encompassing at least 25% 

7 of the length of SEQ ID NO:l, or both. 
8 

9 Ij. A substantially purified polypeptide having significant similarity to a polypeptide 

10 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

1 1 significantly similar i£ when the amino acid sequence of the polypeptide is compared 

12 with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the BLAST 

13 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

14 a % identity greater than 60 or a % positive greater than 70 encompassing at least 50% 

15 of the length of SEQ ID NO:l, or both. 
16 

17 ljj. A substantially purified polypeptide having significant similarity to a polypeptide 

18 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

19 significantly similar if, when the amino acid sequence of the polypeptide is compared 

20 with the amino acid sequence of the polypeptide of SEQ ID NO:l using the BLAST 

21 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

22 a % identity greater than 60 or a % positive greater than 70 encompassing at least 75% 

23 of the length of SEQ ID NO:l, or both. 
24 

25 lk. A substantially purified polypeptide having significant similarity to a polypeptide 

26 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

27 significantly similar if, when the amino acid sequence of the polypeptide is compared 

28 with the amino acid sequence of the polypeptide of SEQ ID NO:l using the BLAST 

29 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

30 a % identity greater than 70 or a % positive greater than 80 encompassing at least 25% 

31 of the length of SEQ ID NO: 1, or both. 
32 
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1 1 1. A substantially purified polypeptide having significant similarity to a polypeptide 

2 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

3 significantly similar if, when the amino acid sequence of the polypeptide is compared 

4 with the amino acid sequence of the polypeptide of SEQ ID NO: I using the BLAST 

5 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

6 a % identity greater than 70 or a % positive greater than 80 encompassing at least 50% 

7 of the length of SEQ ID NO:l, or both. 
8 

9 1 11. A substantially purified polypeptide having significant similarity to a polypeptide. 

1 0 whose sequence is set forth in SEQ ID NO: 1 , wherein a polypeptide is considered 

1 1 significantly similar if, when the amino acid sequence of the polypeptide is compared 

12 with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the BLAST 

13 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

14 a % identity greater than 70 or a % positive greater than 80 encompassing at least 75% 

1 5 of the length of SEQ ID NO: 1, or both. 
16 

17 lm. A substantially purified polypeptide having significant similarity to a polypeptide 

18 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

1 9 significantly similar i$ when the amino acid sequence of the polypeptide is compared 

20 with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the BLAST 

21 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

22 a % identity greater than 80 or a % positive greater than 90 encompassing at least 25% 

23 of the length of SEQ ID NO:l, or both. 
24 

25 In. A substantially purified polypeptide having significant similarity to a polypeptide 

26 whose sequence is set forth in SEQ ID NO: 1, wherein a polypeptide is considered 

27 significantly similar if, when the amino acid sequence of the polypeptide is compared 

28 with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the BLAST 

29 algorithm and the BLOSUM substitution matrix with default parameters, the result is 

30 a % identity greater than 80 or a % positive greater than 90 encompassing at least 50% 

31 of the length of SEQ ID NO: 1, or both. 
32 
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1 Inn. A substantially purified polypeptide having significant similarity to a 

2 polypeptide whose sequence is set forth in SEQ ID NO: 1 , wherein a polypeptide is 

3 considered significantly similar if, when the amino acid sequence of the polypeptide is 

4 compared with the amino acid sequence of the polypeptide of SEQ ID NO: 1 using the 

5 BLAST algorithm and the BLOSUM substitution matrix with default parameters, the 

6 result is a % identity greater than 80 or a % positive greater than 90 encompassing at 

7 least 75% of the length of SEQ ID NO:l, or both. 
8 

9 2. A purified and isolated polynucleotide comprising the polynucleotide sequence of 

10 SEQ ID NO: 2. 
11 

12 3. A purified and isolated polynucleotide comprising the polynucleotide sequence of 

13 SEQ ID NO: 3. 
14 

15 5. A purified and isolated polynucleotide having a sequence that is complementary to 

16 the polynucleotide sequence of claim 2 or 3 . 
17 

18 6. An isolated and purified polynucleotide encoding a polypeptide whose amino acid 

1 9 sequence comprises the amino acid sequence of SEQ ID NO: 1 . 
20 

21 6a. An isolated and purified polynucleotide encoding a polypeptide having significant 

22 similarity to the polypeptide having a sequence set forth in SEQ ID NO:l, wherein a 

23 polypeptide having significant similarity the polypeptide having a sequence set forth 

24 in SEQ ID NO:l is defined as in claim li, lj, lk, 11, lm, or In. 

25 

26 7. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

27 SEQ ID NO:2 under stringent conditions. 
28 

29 7a. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

30 SEQ ID NO:3 under stringent conditions. 
31 



100 



WO 02/08260 



PCTYUS01/23439 



1 7e. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

2 SEQ ID NO:2 under moderately stringent conditions. 
3 

4 7f. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

5 SEQ ID NO:3 under moderately stringent conditions. 
6 

7 7j. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

8 claim 6 under stringent conditions, 
9 

10 7k. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

11 claim 6a under stringent conditions. 
12 

13 7L An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

14 claim 6 under moderately stringent conditions* 
15 

1 6 7m. An isolated and purified polynucleotide that hybridizes to the polynucleotide of 

1 7 claim 6a under moderately stringent conditions. 
18 

19 1 0. An expression vector comprising the polynucleotide of claim 6. 
20 

21 10a. An expression vector comprising the polynucleotide of claim 6a. 
22 

23 1 1 . An expression vector comprising a polynucleotide that encodes the fragment of 

24 any of claims la, lb, or lc. 
25 

26 12. An expression vector comprising a polynucleotide that encodes the variant of any 

27 of claims Id, le, 1£ lg, or lh. 
28 

29 20. A host cell comprising the expression vector of claim 10. 
30 

3 1 20a. A host cell comprising the expression vector of claim 10a. 
32 
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1 30. A method of producing a polypeptide comprising an amino acid sequence selected 

2 from the group consisting of SEQ ID NO: 1 and a polypeptide whose sequence has 

3 significant similarity to the amino acid sequence of SEQ ID NO: 1 , the method 

4 comprising the steps of: 

5 culturing the host cell of claim 20 under conditions wherein the polypeptide is 

6 expressed; and 

7 recovering the polypeptide from the host cell culture. 
8 

9 50. A pharmaceutical composition comprising: 

1 0 the polypeptide of claim 1, or a polypeptide whose sequence has significant 

1 1 similarity to the amino acid sequence of SEQ ID NO: 1 ; and 

12 a pharmaceutical^ acceptable carrier. 
13 

14 55. A pharmaceutical composition comprising: 

15 the polynucleotide of claim 6; and 

16 a pharmaceutical^ acceptable carrier. 
17 

18 60. A purified antibody that specifically binds to the polypeptide of claim 1 . 
19 

20 65. A purified antibody that specifically binds to the polypeptide of SEQ ID NO: 1 . 
21 

22 65a. A purified antibody that specifically binds to a polypeptide having significant 

23 similarity to the polypeptide having a sequence set forth in SEQ ID NO:l, wherein a 

24 polypeptide having significant similarity the polypeptide having a sequence set forth 

25 in SEQ ID NO:l is defined as in claim li, lj, lk, 11, 1m, or In. 
26 

27 66. The antibody of claim 60 or claim 65, wherein the antibody is a polyclonal 

28 antibody. 
29 

30 67. The antibody of claim 60 or claim 65, wherein the antibody is a monoclonal 

31 antibody. 
32 
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1 70. A pharmaceutical composition comprising: 

2 the antibody of claim 60 or claim 65; and 

3 a pharmaceutical^ acceptable carrier. 
4 

5 80. A method for detecting a polynucleotide that encodes a polypeptide comprising an 

6 amino acid sequence set forth in SEQ ID NO: 1 in a biological sample, the method 

7 comprising steps of: 

8 (a) hybridizing a nucleic acid complementary to the polynucleotide that 

9 encodes a polypeptide comprising an amino acid sequence set forth in SEQ ID NO:l, 

1 0 or that encodes a fragment or variant of the polypeptide, to at least one nucleic acid in 

11 the biological sample, thereby forming a hybridization complex; and 

12 (b) detecting the hybridization complex, wherein the presence of the 

13 hybridization complex indicates the presence of a polynucleotide encoding the 

14 polypeptide in the biological sample. 
15 

16 81. The method of claim 80, wherein the biological sample comprises breast cancer 

17 tissue or cells. 



18 

19 82. The method of claim 80, wherein the biological sample comprises normal breast 

20 tissue or cells. 
21 

22 85. A method for detecting a polynucleotide that encodes a polypeptide comprising an 

23 amino acid sequence set forth in SEQ ID NO: 1 comprising steps of: 



24 (a) hybridizing a nucleic acid that encodes a polypeptide comprising an amino 

25 acid sequence set forth in SEQ ID NO:l to at least one nucleic acid complementary to 

26 a nucleic acid in the biological sample, thereby foiming a hybridization complex; and 

27 (b) detecting the hybridization complex, wherein the presence of the 

28 hybridization complex indicates the presence of the polynucleotide in the biological 

29 sample. 
30 

3 1 86. The method of claim 85, wherein the biological sample comprises breast cancer 

32 tissue or cells. 
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1 

2 87. The method of claim 85, wherein the biological sample comprises normal breast 

3 tissue or cells* 
4 

5 90. A method for detecting a polypeptide whose sequence comprises the amino acid 

6 sequence set forth in SEQ ID NO: 1 in a biological sample comprising steps of: 

7 (a) contacting the biological sample with the antibody of claim 60 or claim 65; 

8 and 

9 (b) determining whether the antibody specifically binds to the sample, the 
10 binding being an indication that the sample contains the polypeptide. 

11 

12 91. The method of claim 90, wherein the biological sample comprises breast cancer 

13 tissue or cells. 
14 

1 5 92* The method of claim 90, wherein the biological sample comprises a cell, tissue, 

16 blood, urine, serum, ascites, saliva, or another body fluid, secretion, or excretion. 
17 

1 8 93. The method of claim 90, wherein the determining step comprises performing an 

1 9 enzyme-linked immunosorbent assay. 
20 

21 94. The method of claim 90, wherein the determining step comprises performing 

22 immunohistochemistry. 
23 

24 95. The method of claim 90, wherein the determining step comprises contacting an 

25 antibody array with the sample. 
26 

27 90a. A method for detecting a polypeptide having significant similarity to the 

28 polypeptide whose sequence is set forth in SEQ ID NO: 1 in a biological sample 

29 comprising steps of: 

30 (a) contacting the biological sample with the antibody of claim 60 or claim 65; 

31 and 
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1 (b) determining whether the antibody specifically binds to the sample, the 

2 binding being an indication that the sample contains the polypeptide. 
3 

4 91a. The method of claim 90a, wherein the biological sample comprises breast cancer 

5 tissue or cells. 
6 

7 92a. The method of claim 90a, wherein the biological sample comprises a cell, tissue, 

8 blood, urine, serum, ascites, saliva, or another body fluid, secretion, or excretion. 
9 

10 93a. The method of claim 90a, wherein the determining step comprises performing an 

1 1 enzyme-linked immunosorbent assay. 
12 

13 94a. The method of claim 90a, wherein the determining step comprises performing 

14 immunohistochemistry. 
15 

1 6 95a. The method of claim 90a, wherein the determining step comprises contacting an 

17 antibody array with the sample. 
18 

19 100. A method for treating or preventing a disorder of cell proliferation, the method 

20 comprising administering to a subject in need of such treatment an effective amount 

21 of the pharmaceutical composition of any of claims 50, 55, or 70. 
22 

23 150. A method for classifying a disease comprising the steps of: 

24 (a) providing a sample from a subject; 

25 (b) detecting the presence of a polynucleotide that encodes a polypeptide 

26 having the sequence of SEQ ID NO: 1 within the sample; and 

27 (c) assigning the disease to one of a set of predetermined categories based on 

28 detection of the polynucleotide. 
29 

30 151. The method of claim 1 50, wherein the disease is cancer. 
31 

32 152. The method of claim 150, wherein the disease is breast cancer. 
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1 

2 1 53 . The method of claim 1 50, wherein the detecting step comprises a nucleic acid 

3 amplification step. 
4 

5 1 54. The method of claim 1 50, wherein the sample comprises a cell, tissue, blood, 

6 urine, saliva, ascites, other body fluid, secretion, or excretion. . 
7 

8 1 50a. A method for classifying a disease comprising the steps of: 



9 (a) providing a sample from a subject; 

1 0 (b) detecting the presence of a polynucleotide that encodes a polypeptide 

1 1 having significant similarity to a polypeptide whose sequence comprises the sequence 

12 of SEQ ID NO:l within the sample; and 

13 (c) assigning the disease to one of a set of predetermined categories based on 

1 4 detection of the polynucleotide. 
15 

16 151a. The method of claim 150a, wherein the disease is cancer. 



17 

1 8 1 52a. The method of claim 1 50a, wherein the disease is breast cancer. 
19 

20 1 53a. The method of claim 1 50a, wherein the detecting step comprises a nucleic acid 

21 amplification step. 
22 

23 1 54a. The method of claim 150a, wherein the sample comprises a cell, tissue, blood, 

24 urine, saliva, ascites, other body fluid, secretion, or excretion. 
25 

26 155. The method of claim 150, further comprising the step of: 



27 providing diagnostic, prognostic, or predictive information based on the 

28 predetermined category assigned in the assigning step. 
29 

30 155a. The method of claim 1 50a, further comprising the step of: 

3 1 providing diagnostic, prognostic, or predictive information based on the 

32 predetermined category assigned in the assigning step. 
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1 

2 156. The method of claim 155 or .claim 1 55a, wherein the disease is breast cancer. 
3 

4 1 60. A method for classifying a disease comprising the steps of: 

5 (a) providing a sample from a subject; 

6 (b) detecting the presence of a polypeptide comprising the sequence of SEQ 

7 ID NO: 1 within the sample; and 

8 (c) assigning the disease to one of a set of predetermined categories based on 

9 detection of the polynucleotide. 
10 

11 161. The method of claim 160, wherein the disease is cancer. 
12 

13 162. The method of claim 160, wherein the disease is breast cancer. 
14 

15 164. The method of claim 160, wherein the sample comprises a cell, tissue, blood, 

16 urine, saliva, ascites, other body fluid, secretion, or excretion. 
17 

18 165. The method of claim 1 60, further comprising the step of providing diagnostic, 

19 prognostic, or predictive information based on the category assigned in the assigning 

20 step. 
21 

22 1 66. The method of claim 160, wherein the detecting step is performed using an 

23 antibody that specifically binds to the polypeptide of SEQ ID NO:l . 
24 

25 166a. The method of claim 166, wherein the determining step comprises performing 

26 an enzyme-linked immunosorbent assay. 
27 

28 166b. The method of claim 166, wherein the determining step comprises performing 

29 immunohistochemistry. 
30 

3 1 1 66c. The method of claim 1 66, wherein the determining step comprises contacting an 

32 antibody array with the sample. 
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1 

2 170. A method for classifying a disease comprising the steps of: 

3 (a) providing a sample from a subject; 

4 (b) detecting the presence of a polypeptide having significant similarity to a 

5 polypeptide comprising the sequence of SEQ ID NO: 1 within the sample; and 

6 (c) assigning the disease to one of a set of predetermined categories based on 

7 detection of the polynucleotide. 
8 

9 171 . The method of claim 170, wherein the disease is cancer. 
10 

1 1 172. The method of claim 170, wherein the disease is breast cancer. 
12 

13 174. The method of claim 170, wherein the sample comprises a cell, tissue, blood, 

14 urine, saliva, ascites, other body fluid, secretion, or excretion. 
15 

16 175. The method of claim 1 70, further comprising the step of providing diagnostic, 

17 prognostic, or predictive information based on the category assigned in the assigning 

18 step. 
19 

20 176. The method of claim 170, wherein the detecting step is performed using an 

2 1 antibody that specifically binds to the polypeptide. 
22 

23 176a. The method of claim 176, wherein the determining step comprises performing 

24 an enzyme-linked immunosorbent assay. 
25 

26 176b. The method of claim 176, wherein the determining step comprises performing 

27 immunohistochemistry. 
28 

29 176c. The method of claim 176, wherein the determining step comprises contacting an 

30 antibody array with the sample. 
31 
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1 180. A method for obtaining prognostic, diagnostic, or therapeutic information 

2 comprising steps of: 

3 (i) obtaining a sample containing cells or tissue from a subject; and 

4 (ii) detecting, within the sample, a mutation in a regulatory or coding region of 

5 a gene that encodes the BSTT-ECG1 polypeptide of SEQ ID NO: 1 . 
6 

7 220. A method of inhibiting the growth of a cell comprising enhancing the level or 

8 activity of a polypeptide comprising the amino acid sequence of SEQ ID NO: 1 or a 

9 polypeptide having significant similarity to the polypeptide of SEQ ID NO: 1 in the 
10 cell. 

H 

1 2 226. The method of claim 220, wherein the cell is a tumor cell. 
13 

1 4 250. A method of treating or preventing a tumor comprising steps of: 

1 5 (i) providing an individual in need of treatment or prevention of a tumor; 

1 6 (ii) administering a compound that enhances the level or activity of a 

1 7 polypeptide whose sequence comprises the amino acid sequence of SEQ ID NO: 1 . 
18 

1 9 260. A method of treating or preventing a tumor comprising steps of: 

20 (i) providing an individual in need of treatment or prevention of a tumor; 

21 (ii) administering a compound that reduces or inhibits the level or activity of a 

22 polypeptide whose sequence comprises the amino acid sequence of SEQ ID NO: 1 . 
23 

24 3 10. A method of inhibiting the growth of a cell comprising reducing the level or 

25 activity of a polypeptide comprising the amino acid sequence of SEQ ID NO: 1 in the 

26 cell. 
27 

28 

29 400. A diagnostic kit comprising: 

30 an antibody that specifically binds to the polypeptide of SEQ ID NO: 1 ; 

3 1 instructions for use of the antibody; 
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1 and a control sample, wherein the antibody specifically binds to a polypeptide 

2 in the control sample. 
3 

4 601. A method of classifying a tumor comprising the steps of: 

5 providing a tumor sample; 

6 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

7 NO: 1 in the sample; and 

8 classifying the tumor as belonging to a tumor subclass based on the results of 



9 the detecting step. 
10 

1 1 605. The method of claim 601, wherein the detecting step comprises detecting the 

12 polypeptide. 
13 

14 606. The method of claim 605, wherein the polypeptide is detected by performing 

15 immunohistochemical analysis on the sample using an antibody that specifically binds 

16 to the polypeptide. 
17 

18 606a. The method of claim 605, wherein the polypeptide is detected by performing an 

19 ELISA assay using an antibody that specifically binds to the polypeptide. 
20 

21 606b. The method of claim 605, wherein the polypeptide is detected using an antibody 

22 array comprising an antibody that specifically binds to the polypeptide. 
23 

24 606c. The method of claim 605, wherein the detecting step comprises: 

25 detecting modification of a substrate by the polypeptide. 
26 

27 607. The method of claim 601, wherein classifying a tumor comprises: 

28 stratifying a subject having the tumor for a clinical trial. 
29 

30 608, The method of claim 607, wherein the tumor is a breast tumor. 
31 
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1 609. The method of claim 601, wherein the tumor is a breast tumor and the tumor 

2 subclass is a luminal tumor subclass. 
3 

4 601a. The method of claim 601, further comprising: 

5 providing diagnostic* prognostic, or predictive information based on the 

6 classifying step. 
7 

8 605a. The method of claim 605, further comprising: 

9 providing diagnostic, prognostic, or predictive information based on the 



10 classifying step. 
11 

12 606aa. The method of claim 605a, wherein the polypeptide is detected by performing 

13 immunohistochemical analysis on the sample using an antibody that specifically binds 

14 to the polypeptide. 
15 

16 606ab. The method of claim 605a, wherein the polypeptide is detected by performing 

1 7 an ELISA assay using an antibody that specifically binds to the polypeptide. 
18 

19 606ac. The method of claim 605a, wherein the polypeptide is detected using an 

20 antibody array comprising an antibody that specifically binds to the polypeptide. 
21 

22 606ad. The method of claim 605a, wherein the detecting step comprises: 

23 detecting modification of a substrate by the polypeptide. 
24 

25 609a. The method of claim 601a, wherein the tumor is a breast tumor and the tumor 

26 subclass is a luminal tumor subclass. 
27 

28 601g. The method of claim 601, further comprising: 



29 selecting a treatment based on the classifying step. 
30 

3 1 605g. The method of claim 605, further comprising: 

32 selecting a treatment based on the classifying step. 
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1 

2 606ag. The method of claim 605g, wherein the polypeptide is detected by performing 

3 immunohistochemical analysis on the sample using an antibody that specifically binds 

4 to the polypeptide. 
5 

6 606bg. The method of claim 605g, wherein the polypeptide is detected by performing 

7 an ELISA assay using an antibody that specifically binds to the polypeptide. 
8 

9 606cg. The method of claim 605g, wherein the polypeptide is detected using an 

10 antibody array comprising an antibody that specifically binds to the polypeptide. 
11 

12 606dg. The method of claim 605g, wherein the detecting step comprises: 

13 detecting modification of a substrate by the polypeptide. 
14 

15 609g. The method of claim 601g, wherein the tumor is a breast tumor and the tumor 

16 subclass is a luminal tumor subclass. 
17 

18 60 1 m. A method of testing a subject comprising the steps of: 

19 providing a sample isolated from a subject; 

20 detecting expression or activity of a gene encoding the polypeptide of SEQ ED 

21 NO: 1 in the sample; and 

22 providing diagnostic, prognostic, or predictive information based on the 

23 detecting step. 
24 

25 605m. The method of claim 60 1 m, wherein the detecting step comprises detecting the 

26 polypeptide. 
27 

28 606m. The method of claim 605m, wherein the polypeptide is detected by performing 

29 immunohistochemical analysis on the sample using an antibody that specifically binds 

30 to the polypeptide. 
31 
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1 606ma. The method of claim 605m, wherein the polypeptide is detected by 

2 performing an ELISA assay using an antibody that specifically binds to the 

3 polypeptide. 
4 

5 606mb. The method of claim 605m, wherein the polypeptide is detected using an 

6 antibody array comprising an antibody that specifically binds to the polypeptide. 
7 

8 606mc. The method of claim 605m, wherein the detecting step comprises: 



9 detecting modification of a substrate by the polypeptide. 

10 

1 1 609m. The method of any of claim 601m, wherein the sample is selected from the 

12 group consisting of: 

13 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 



1 4 sample,, a cell, and a portion of tissue. 
15 

16 61 0m. The method of claim 601m, wherein the sample is a tumor sample. 
17 

18 61 lm. The method of claim 610m, wherein the tumor sample is a breast tumor 

19 sample. 
20 

21 601r. A method of testing a subject comprising the steps of: 



22 providing a sample isolated from a subject; 

23 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

24 NO: 1 in the sample; and 

25 stratifying the subject for a clinical trial based on the detecting step. 
26 

27 605r. The method of claim 601r, wherein the detecting step comprises detecting the 

28 polypeptide. 



29 

30 606r. The method of claim 605r, wherein the polypeptide is detected by performing 

3 1 immunohistochemical analysis on the sample using an antibody that specifically binds 

32 to the polypeptide. 
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1 

2 606ra. The method of claim 605r, wherein the polypeptide is detected by performing 

3 an ELISA assay using an antibody that specifically binds to the polypeptide. 
4 

5 606rb. The method of claim 605r, wherein the polypeptide is detected using an 

6 antibody array comprising an antibody that specifically binds to the polypeptide. 

7 

8 606rc. The method of claim 605r, wherein the detecting step comprises: 



9 detecting modification of a substrate by the polypeptide. 

10 

1 1 609r. The method of claim 601r, wherein the sample is selected from the group 

12 consisting of: 

13 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 



1 4 sample, a cell, and a portion of tissue. 
15 

16 61Gr. The method of claim 601r, wherein the sample is a tumor sample. 
17 

18 61 Ir. The method of claim 610r, wherein the tumor sample is a breast tumor sample. 
19 

20 601q. A method of testing a subject comprising the steps of: 



21 providing a sample isolated from a subject; 

22 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

23 NO:l in the sample; and 

24 selecting a treatment based on the detecting step. 
25 

26 605q. The method of claim 601q, wherein the detecting step comprises detecting the 

27 polypeptide. 



28 

29 606q. The method of claim 605q, wherein the polypeptide is detected by performing 

30 immunohistochemical analysis on the sample using an antibody that specifically binds 

31 to the polypeptide. 
32 
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1 606qa. The method of claim 605q, wherein the polypeptide is detected by performing 

2 an ELISA assay using an antibody that specifically binds to the polypeptide. 
3 

4 606qb. The method of claim 605q, wherein the polypeptide is detected using an 

5 antibody array comprising an antibody that specifically binds to the polypeptide. 
6 

7 606qc. The method of claim 605q, wherein the detecting step comprises: 

8 detecting modification of a substrate by the polypeptide. 
9 

10 609q. The method of claim 601q, wherein the sample is selected from the group 

1 1 consisting of: 

12 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 

13 sample, a cell, and a portion of tissue. 
14 

15 6 lOq. The method of claim 60 lq, wherein the sample is a tumor sample. 
16 

17 61 lq. The method of claim 610q, wherein the tumor sample is a breast tumor sample. 
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Figure 1A 



The amino acid sequence of BSTP-ECG1 (SEQ ID NO;l) is presented below. 

MKTLIAAYSGVLRGERQAEADRSQRSHGGPALSREGSGRWGTGSSILSALQDLFSVTWLNRSKV^KQLQVI 
S VLQWVLS FLVLGVACS AI LMYI FCT DCWLI AVLY FTMLVFDWNT PKKGGRRSQWVRNWAVWRYFRDY FP I 
QLVKTHNLLTTRNYIFGYHPHGIMGLGAFCNFSTEATEVSKKFPGIRPYLATLAGNFRMPVLREYLMSGGI 
CPVSRDriDYLLSKNG£GNAIIIWGGAAESLSSMPGK^W 

YKQVIFEEGSWGRWVQKKFQKYIGFAPCIFHGRGLFSSDTWGLVPYSKPITTVVGEPITIPKLEHPTQQDI 
DLYHTMYMBALVKLFDKHKTKFGLPETEVLEVN 
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Figure IB 

The nucleotide sequence (SEQ ID NO:2) of an open reading frame that encodes BSTP- 
ECG1 is presented below. 

ATGAAGACCCTCATAGCCGCCTACTCCGGGGTCCTGCGCGGCGAGCGTCAGGCCGAGGCTGACCGGAGCCA 
GCGCTCTCACGGAGGACCTGCGCTGTCGCGCGAGGGGTCTGGGAGATGGGGCACTGGATCCAGCATCCTCT 
CCGCCCTCCAGGACCTCTTCTCTGTCACCTGGCTCAATAGGTCCAAGGTGGAAAAGCAGCTACAGGTCATC 
TCAGTGCTCCAGTGGGTCCTGTCCTTCCTTGTACTGGGAGTGGCCTGCAGTGCCATCCTCATGTACATATT 
CTGCACTGATTGCTGGCTCATCGCTGTGCTCTACTTCACTTGGCTGGTGTTTGACTGGAACACACCCAAGA 
AAGGTGGCAGGAGGTCACAGTGGGTCCGAAACTGGGCTGTGTGGCGCTACTTTCGAGACTACTTTCCCATC 
CAGCTGGTGAAGACACACAACCTGCTGACCACCAGGAACTATATCTTTGGATACCACCCCCATGGTATCAT 
GGGCCTGGGTGCCTTCTGCAACTTCAGCACAGAGGCCACAGRAGTGAGCAAGAAGTTCCCAGGCATACGGC 
CTTACCTGGCTACACTGGCAGGCAACTTCCGAATGCCTGTGTTGAGGGAGTACCTGATGTCTGGAGGTATC 
TGCCCTGTCAGCCGGGACACCATAGACTATTTGCTTTCAAAGAATGGGAGTGGCAATGCTATCATCATCGT 
GGTCGGGGGTGCGGCTGAGTCTCTGAGCTCCATGCCTGGCAAGAATGCAGTCACCCTGCGGAACCGCAAGG 
GCTTTGTGAAACTGGCCCTGCGTCATGGAGCTGACCTGGTICCCATCTACTCCTTTGGAGAGAATGAAGTG 
TACAAGC^GGTGATCTTCGAGGAGGGCTCCTGGGGCCGATGGGTCCAGAAGAAGTTCCAGAAATACATTGG 
TTTCGCCCCATGCATCTTCCATGGTCGAGGCCTCTTCTCCTCCGACACCTGGGGGCTGGTGCCCTACTCCA 
AGCCCATCACCACTGTTGTGGGAGAGCCCATCACCATCCCCAAGCTGGAGCACCCAACCCAGCAAGACATC 
GACCTGTACCACACCATGTACATGGAGGCCCTGGTGAAGCTTTTCGACAAGCACAAGACCAAGTTCGGCCT 
CCCGGAGACTGAGGTCCTGGAGGTGAACTGA 
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Figure 1C 

The nucleotide sequence (SEQ ED NO:3) of a cDNA that encodes BSTP-ECG1 is 
presented below. 

TCCGGGACGCCAGCGCCGCGGCTGCCGCCTCTGCTGGGGTCTAGGCTGTTTCTCTCGCGCCACCACTGGCC 
GCCGGCCGCAGCTCCAGGTGTCCTAGCCGCCCAGCCTCGACGCCGTCCCGGGACCCCTGTGCTCTGCGCGA 
AGCCCTGGCCCCGGGGGCCGGGGCATGGGCCAGGGCGCGGGGTGAAGCGGCTTCCCGCGGGGCCGTGACTG 
GGCGGGCTTCAGCCATGAAGACCCTCATAGCCGCCTACTCCGGGGTCCTGCGCGGCGAGCGTCAGGCCGAG 
GCTGACCGGAGCCAGCGCTCTCACGGAGGACCTGCGCTGTCGCGCGAGGGGTCTGGGAGATGGGGCACTGG 
ATCCAGCATCCTCTCCGCCCTCCAGGACCTCTTCTCTGTCACCTGGCTCAATAGGTCCAAGGTGGAAAAGC 
AGCTACAGGTCATCTCAGTGCTCCAGTGGGTCCTGTCCTTCCTTGTACTGGGAGTGGCCTGCAGTGCCATC 
CTCATGTACATATTCTGCACTGATTGCTGGCTCATCGCTGTGCTCTACTTCACTTGGCTGGTGTTTGACTG 
GAACACACCCAAGAAAGGTGGCAGGAGGTCACAGTGGGTCCGAAACTGGGCTGTGTGGCGCTACTTTCGAG 
ACTACTT TCCCATCCAGCTGGTGAAGACACACAACCTGCT GACCACCAGGAACT ATATCT TTGGATACCAC 
CCCCATGGTATCATGGGCCTGGGTGCCTTCTGCAACTTCAGCACAGAGGCCACAGAAGTGAGCAAGAAGTT 
CCCAGGCATACGGCCTTACCTGGCTACACTGGCAGGCAACTTCCGAATGCCTGTGTTGAGGGAGTACCTGA 
TGTCTGGAGGTATCTGCCCTGTCAGCCGGGACACCATAGACTATTTGCTTTCAAAGAATGGGAGTGGCAAX 
GCTATCATCATCGTGGTCGGGGGTGCGGCTGAGTCTCTGAGCTCCATGCCTGGCAAGAATGCAGTCACCCT 
GCGGAACCGCAAGGGCTTTGTGAAACTGGCCCTGCGTCATGGAGCTGACCTGGTTCCCATCTACTCCTTTG 
GAGAGAATGAAGTG TACAAGCAGGTGATCTTCGAG GAGGGCTCCT GGGGCCGAT GGGTCCAGAAGAAGT TC 
CAGAAATACATTGGTTTCGCCCCATGCATCTTCCATGGTCGAGGCCTCTTCTCCTCCGACACCTGGGGGCT 
GGTGCCCTACTCCAAGCCCATCACCACTGTTGTGGGAGAGCCCATCACCATCCCCAAGCTGGAGCACCCAA 
CCCAGCAAGACATCGACCTGTACCACACCATGTACATGGAGGCCCTGGTGAAGCTTTTCGACAAGCACAAG 
ACCAAGTTCGGCCTCCCGGAGACTGAGGTCCTGGAGGTGAACTGAGCCAGCCTTCGGGGCCAATTCCCCTG 
GAGGAACCAGCTGCAAATCACXTTTTTGCTCTGTAAATTTGGAAGTGTCATGGGTGTCTGTGGGTTATTTA 
AAAGAAATT AT AACAAT TT TGCT AAACCATT ACAATGTT AGGTCTT TTTT AAGAAGG AAAAAGTCAGT? ATT 
TCAAGTTCTTTCACTTCCAGCTTGCCCTGTTCTAGGTGGTGGCTAAATCTGGGCCTAATCTGGGTGGCTCA 
GWAACCTCTCTTCTTCCCT'TCCTGAAGTGACAAAGGAAACTCAGTCTTCTTGGGGAAGAAGGATTGCCAT 
TAGTGACTTGGACCAGTTAGAT GATTCACTTT TT GCCCC TAGGGAT GAGAGGCGAAAGCCACTTG TCATAC 
AAGCCCCTTTATTGCCACTACCCCACGCTCGTCTAGTCCrGAAACTGCAGGACCAGTTTCTCTGCCAAGGG 
GAGGAGTTGGAG AGCACAGT TGCCCCGTTGT GTGAGGGCAGT AGT AGGCAT CT GGAATGCTCCAGTTTGAT 
YTCCCTTCTGCCACCCCTACCTCACCCCTAGTCACTCATATCGGAGCCTGGGACTGGGCCTCCAGGATGAG 
GATGGGGGTGGCAATGACACCCTGCAGGGGAAAGGACTGCCCCCCATGCACCATTGCAGGGAGGATGCCGC 
CACCATGAAGCTAGGTGGAGTAACTGGTTTTTCTTGGGTGGCTGATGACATGGATGCAGCACAGACTCAGC 
CTTGGCCTGGAGCACATGCTTACTGGTGGCCTCAGTTTACCTTCCCCAGATCCTAGATTCTGGATGTGAGG 
AAGAGATCCCTCTTCAGAAGGGGCCTGGCCTTCTGAGCAGCAGATTAGTTCCAAAGCAGGTGGCCCCCGAA 
CCCAAGCCTCACTTTTYTGTGCCTTCCTGAGGGGGTTGGGCCGGGGAGGAAACCCAACCCTCTCCTGTGTG 
TTCTGTTATCTYTTGATGAGATCATTGCACCATGTCAGACTTTTGTATATGCCTTGAAAATAAATGAAAGT 
GAGAATCAAAAAAAAAAAAAAAAAAAAAAAA 
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Figure ID 

The nucleotide sequence (SEQ ID NO:4) of a second cDNA that encodes BSTP-ECG1 is 
presented below. 

TCCGGGACGCCAGCGCCGCGGCTGCCGCCTCTGCTGGGGTCTAGGCTGTTTCTCTCGCGCCACCACTGGCC 
GCCGGCCGCAGCTCCAGGTGTCCTAGCCGCCCAGCCTCGACGCCGTCCCGGGACCCCTGTGCTCTGCGCGA 
AGCCCTGGCCCCGGGGGCCGGGGCATGGGCCAGGGCGCGGGGTGAAGCGGCTTCCCGCGGGGCCGTGACTG 
GGCGGGCTTCAGCCATGAAGACCCTCATAGCCGCCTACTCCGGGGTCCTGCGCGGCGAGCGTCAGGCCGAG 
GCTGACCGGAGCCAGCGCTCTCACGGAGGACCTGCGCTGTCGCGCGAGGGGTCTGGGAGATGGGGCACTGG 
ATCCAGCATCCTCTCCGCCCTCCAGGACCTCTTCTCTGTCACCTGGCTCAATAGGTCCAAGGTGGAAAAGC 
AGCTACAGGTCATCTCAGTGCTCCAGTGGGTCCTGTCCTTCCTTGTACTGGGAGTGGCCTGCAGTGCCATC 
- CTCATGTACATATTCTGCACTGATTGCTGGCTCATCGCTGTGCTCTACrTCACTTGGCTGGTGTTTGACTG 
GAACAC ACCCAAGAAAGGTGGCAGGAGGTCACAGTGGGTCCGAAACTGGGCTGT GTGGCGCTACT TTCGAG 
ACTACTTTCCCATCCAGCTGGTGAAGACACACAACCTGCTGACCACCAGGAACTATATCTTTGGATACCAC 
CCCCATGGTATCATGGGCCTGGGTGCCTTCTGCAACTTCAGCACAGAGGCCACAGAAGTGAGCAAGAAGTT 
CCCAGGCATACGGCCTTACCTGGCTACACTGGCAGGCAACTTCCGAATGCCTGTGTTGAGGGAGTACCTGA 
TGTCTGGAGGTATCTGCCCTGTCAGCCGGGACACCATAGACTATTTGCTTTCAAAGAATGGGAGTGGCAAT 
GCTATCATCATCGTGGTCGGGGGTGCGGCTGAGTCTCTGAGCTCCATGCCTGGCAAGAATGCAGTCACCCT 
GCGGAACCGCAAGGGCTTTGTGAAACTGGCCCTGCGTCATGGAGCTGACCTGGTTCCCATCTACTCCTTTG 
GAGAGAATGAAGTGTACAAGCAGGTGATCTTCGAGGAGGGCTCCTGGGGCCGATGGGTCCAGAAGAAGTTC 
CAGAAATACATTGGTTTCGCCCCATGCATCTTCCATGGTCGAGGCCTCTTCTCCTCCGACACCTGGGGGCT 
GGTGCCCTACTCCAAGCCCATCACCACTGTTGTGGGAGAGCCCATCACCATCCCCAAGCTGGAGCACCCAA 
CCCAGCAAGACATCGACCTGTACCACACCATGTACATGGAGGCCCTGGTGAAGCTTTTCGACAAGCACAAG 
ACCAAGTTCGGCCTCCCGGAGACTGAGGTCCTGGAGGTGAACTGAGCCAGCCTTCGGGGCCAATTCCCCTG 
GAGGAACCAGCTGCAAATCACTTTTTTGCTCTGTAAATTTGGAAGTGTCATGGGTGTCTGTGGGTTATTTA 
AAAGAAATTATAACAATTTTGCTAAACCATTAAAAAAAAAAAAAAAAAAAAA 
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Figure 2 

The consensus sequence derived from I.M.A.G.E. clones 161484, 48805, 1276329, 
1343900, and 1560906 (SEQ ID NO; 5) is presented below. 

TGGCAATTATGAAAAACTTCCCAAGGCCTACCCCTTGGACTAGCCTCTTTAAATACTTCATGTGGTCTCCA 
AAATGACCCCGANTGCGATACCATTCCCATTGTGTOTCTATAAAAACCAGGGCACAGGGGANCATGCAAAC 
AGCCCAAANTTACAAANCCATGAGGTGGAGGGCTAGGATCTGAACCCAGNTCTGTCTGATTCTATANCTGA 
TGCTCTTCTCATATCTAAAAGGGTACCTGTGGGAGGTGAGGTTTGTACTGGGGACCCCATGACTGGAAAAA 
AGGGTGACAGTGGACTGACATCTTCCCTCTGCTGTAGGCACTGGATCCANCATCCTCTCCGCCCTCCANAA 
CCTCTTCTCTNTCACCTGGCTCAATAGGTCCAAGGTGGAAAAGCANCTACAGGTCATCTCAGTGCTCCAGT 
GGGTCCTGTCCTTCCTTGTACTGGGANTGGCCTGCAGTGCCATCCTCATGTACATATTCTGCACTGATTGC 
TGGCTCATCGCTGTGCTCTACTTCACTTGGCTGGTGTTTGACTGGAACACACCCAAGAAAGGTGGCAGGAG 
GTCACAGTGGGTCCGAAACTGGGCTGTGTGGCGCTACTTTCGAGACTACTTTCCCATCCAGCTGGTGAAGA 
CACACAACCTGCTGACCACCAGGAACTATATCTTTGGATACCACCCCCATGGTATCATGGGCCTGGGTGCC 
TTCTGCAACTTCAGCACAGAGGCCACAGAAGTGAGCAAGAAGTTCCCAGGCATACGGCCTTACCTGGCTAC 
ACTGGCAGGCAACTTCCGAATGCCTGTGTTGAGGGAGTACCTGATGTCTGGAGGTATCTGCCCTGTCAGCC 
GGGACACCATAGACTATTTGCTTTCAAAGAATGGGAGTGGCAATGCTATCATCATCGTGGTCGGGGGTGCG 
GCTGAGTCTCTGAGCTCCATGCCTGGCAAGAATGCAGTCACCCTGCGGAACCGCAAGGGCTTTGTGAAACT 
GGCCCTGCGTCATGGAGCTGACCT GGT TCCCATCTACTCCT TTGGAGAGAATG AAGTGT ACAAGCAGGTG A 
TCTTCGAGGAGGGCTCCTGGGGCCGATGGGTCCAGAAGAAGTTCCAGAAATACATTGGTTTCGCCCCATGC 
ATCTTCCATGGTCGAGGCCTCTTCTCCTCCGACACCTGGGGGCTGGTGCCCTACTCCAAGCCCATCACCAC 
TGTTGTGGGAGAGCCCATCACCATCCCCAAGCTGGAGCACCCAACCCAGCAAGACATCGACCTGTACCACA 
CCATGTACATGGAGGCCCTGGTGAAGCTTTTCGACAAGCACAAGACCAAGTTCGGCCTCCCGGAGACTGAG 
GTCCTGGAGGTGAACTGAGCCAGCCTTCGGGGCCAATTCCCCTGGAGGAACCAGCTGCAAATCACTTTTTT 
GCTCTGTAAATTTGGAAGTGTCATGGGTGTCTGTGGGTTATTTAAAAGAAATTATAACAATTTTGCTAAAC 
OATTAAAAAAAAAAAAAAAAAAAAARAARRAAAAAGTCAGTACT 

TGTTCTAGGTGGTGGCTAAATCTGGGCCTAATCTGGGTGGCTCAGCTAACCTCTCTTCTTCCCTTCCTGAA 
GTGACAAAGGAAACTCAGTCTTCTTGGGGAAGAAGGATTGCCATTAGTGACTTGGACCAGTTAGATGATTC 
ACTTTTTGCCCCTAGGGATGAGAGGCGAAAGCCACTTCTCATACAAGCCCCTTTATTGCCACTACCCCACG 
CTCGTCTAGTCCTGAAACTGCAGGACCAGTTTCTCTGCCAAGGGGAGGAGTTGGAGAGCACAGTTGCCCCG 
TTGTGTGAGGGCAGTAGTAGGCATCTGGAATGCTCCAGTTTGATYTCCCTTCTGCCACCCCTACCTCACCC 
CTAGTCACTCATATCGGAGCCTGGGACTGGGCCTCCAGGATGAGGATGGGGGTGGCAATGACACCCTGCAG 
GGGAAAGGACTGCCCCCCATGCACCATTGCAGGGAGGATGCCGCCACCATGAAGCTAGGTGGAGTAACTGG 
TTTTTCTTGGGTGGCTGATGACATGGATGCAGCACAGACTCAGCCTTGGCCTGGAGCACATGCTTACTGGT 
GGCCTCAGTTTACCTTCCCCAGATCCTAGATTCTGGATGTGAGGAAGAGATCCCTCTTCAGAAGGGGCCTG 
GCCTTCTGAGCAGCAGATTAGTTCCAAAGCAGGTGGCCCCCGAACCCAAGCCTCACTTTTYTGXGCCTTCC 
TGAGGGGGTTGGGCCGGGGAGGAMCCCAACCCTCTCCTGTGTGTTCTGTTATCTYTTGATGAGATCATTG 
CACCATGTCAGACTTTTGTATATGCCTTGAAAATAAATGAAAGTGAGAATCAAAAAAAAAAAAAAAAAAAA 
AAAA 

3296356 J .DOC 
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TCC<36GACGCCAGCGCCGCGGCTGCCGCCTCT6CTGGGGTCTAS6CTGnTCTCTCGCGCCACCACTGGCCGCCGGCC6CAGCTCCAGGTGTCCTAGCCG 1QO 

CCC AGCCTCGACGCCG TCCCGGG ACCCC TGTGCTC TGCGCGAAGCCC TGGCCCCGGGGGCCGGGGC ATGGGCC AGSGC6CGG6G 76 AAGCGGt TTC CCGC 200 

GGGGCCGTGACTGGGCGGGCTTCAGCCATGAAGACCCTCArAGCCGCCTACTCCGGGGTCCTGCGCGGCGAGCGTCAGGCCGAGGCTGACCGGAGCCAGC 300 
MKTL IAAVSGVLRGEROAXaDPSQ 

GCTCTCAC6GAGGACCTGCGCTGTCGCGCGA6GGGTCTGGGAGATGGGGCACTGGATCCAGCATCCTCTCCGCCCTCCAGGACCTCnCTCTGTCACCTG 400 
P. 3HG6PAL SREGSGRVGT6SS I L SALODLFSV^v 

GCTCAATAGGTCCAAGGTGGAMAGCAGCTACAGGTCATCTCAGTGCTCCAGT6G6TCCTGTCCTTCCTTGTACTGGGAGTG6CCTGCAGTGCCATCCTC 50l) 
I N ft 5 K \t E 

ATGTACATArrCTGCACTGATTGCTGGCTCATCGCTGTGCTCTACTTCACTTGGCTGGTGTTTGACTGGAACACACCCMGAA^ 600 



AGTGGGTCCGAMCTGGGCTGTGTGGCGCTACTTTCGAGACTACTTTCCCATCCAGCTGGTGAAGACACACAACCTGCTGACCACCAGGAACTATATCTT 700 
OVVRNVAVyRVFRDVFP t OLVJ>. THNLL'TTRh t . F 

TGGATACCACCCCCATGGTATCATGGGCCTGGGTGCCnCTGCMlTITCAGCACAGAGGCCACAGAAGTGAGCAAGAAGn BOO 
G 7 H P H S I H ti I b A F L'MF SfEATEVSKKFPfi 1 K F , 

ct6gct acactggca6gc a ac ttc cgaatgcctgtgttgaggg agtacctg atgtc tgg asgt atctgccc tgtc agccggg acacc at ag ac t at ttgc 900 
l atl aghfrmpvlre v l m s g 6 1 cpvsrot •: c r i 

tttcaaagaatgggagtggcaatgctatcatcatcgtggtcgggggtgcggctgagtctctgagctccatgcctggcaagaatgcagtcaccctgcggaa 1000 
lskng56naj 1 ivvggaae5ls3h* 6knavtlri* 

ccgcmgggcmgtgmactggccctgcgtcatggagctgacctggtto 1100 

RKGFVKLALRHGADLVP Ir8FG'£NEVYlCQViFi 

gagggctcctggggccgatgggtccagaagaagttccagaaatacattggtttcgccccatgcatcttccatggtcgaggcctcttctcctccgacacct 1200 

EGSVQRVVOKkFQICV f GFAPC 1FH GRGLFSS0T 

, GGGGGCTGGTGCCCTACTCCAAGCCCATCACCACTGTTGTGGGAGAGCCCATCACCATCCCCAAGCTGGAGCACCCAACCCAGCAAGACATCGACCTGTA ^r*>3 
V6LVPYSKP)rrvV6£P£riPKLEHPraQ0IDLY 

ccacaccatgtacatggaggccctggtgaagcttttcgacaagcacaagaccaagttcggcc7cccg6agactgaggtcctggaggtgaactgagccagc moo 

HTMYMEALVKl. FDKHkTKFGLPETEVLEVN . 

I 

CTTCGGGGCCAATTCCCCTGGAGGMCCAGCTGCAAATCACTTTTTTGCTCTGTAM^ 1500 

ATAACAATrTTGCTAMCCATT^AATGTTAGGTCmnTAAGAAGGAAMAGTCAGTAmCAAGTTCmCACTTCCA 1600 

GTGGCTAAATCTGGGCCTAATCrGGGrGGCrCAGCTAACCTCTCTTCTTCCCTTCCTGAAGTGACAAAGGAMCTCAGTCTTCTTGGGGMGAAGGATTG I70O 

CCATTAGTGACTrGGACCAGTTAGATGATTCACTTTTTGCCCCTAGGGATGAGAGGCGAAAGCCACTTCTCATACAAGCCCCTTTATTGCCACTACCCCA 1300 

CGCTCGTCTAGTCCTGAAACTGCAGGACCAGTTTCTCTGCCAAGGGGAGGAGrTGGAGAGCACAGTTGCCCCGTTGTGTGAGGGCAGTAGrAGGCATCTG 1300 

GAATGCTCCAGTTTGATYTCCCTTCTGCCACCCCTACCTCACCCCTAGTCACrCATATCGGAGCCTGGGACTGGGCCTCCAdGATGAGGATGGGGGTGGC 2VQ0 

AATGAC AC CC TGCAGGGG AAAGG ACTGCCCCCC ATGC ACC ATTGC AGGG AGG ATGCCGCC ACC ATGAAGCT AGG TGG AG T AACTGGTTTTTC T TGSG TGG 2100 

CTGATGACATGGATGCAGCACAGACTCAGCCTTGGCCTGGAGCACATGCTTACTGGTGGCCTCAGTTTACCTTCCCCaGATCCTAGaTTCTGGATGTGAG 2200 

GAAGAGATCCCTCTTCAGAAGGGGCCTGGCCTTCTGAGCAGCAGATTAGTTCCAAAGCAGGTGGCCCCCGAACCCAAGCCTCACTTTTYTGTGCCTTCCr 2300 

6 AGGGGGTTGGGCC GGGGAGG AAACCCAACCCTCTCCTG tgtgttc tg TTATCT YT TGATG AG ATCA TTGCACCATGTC AGAC ttttg TATATGCCTTG A 2100 
AAATAAATGAAAGTGAGAATCAAAAAAAAAAAAAAAAAAAAAAAA 2a^6 
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